Closed aaronhktan closed 2 years ago
@aaronhktan Please don't scrape. Use this link instead: https://words.hk/static/all.csv.gz
@hnfong The contents in that link only seem to contain words that public. Is that right? I'd also like to include words that are still hidden behind the login.
I believe the list is complete.
Oh wow, yeah, after taking a second look it does seem to have all the entries. I'll update the code I have in the repository to parse that file instead when I have the time.
Since I've already written code that generates dictionary data from words.hk through scraping, that implementation fulfills the requirements of this issue. However, I will create a new issue to instead parse the file from the link in this discussion.
words.hk (https://words.hk/) is a great Cantonese-Cantonese and Cantonese-English dictionary. Of particular value are the Cantonese definitions, as well as the example sentences.
words.hk does not provide their data available as a download, but the majority of their data is licensed under the open data license that permits use as long as proper credit is given and it is for non-commercial purposes. Scraping the website is also not expressly forbidden, so that may be what needs to be done.