dimazest / google-ngram-downloader

Other
96 stars 25 forks source link

AssertionError because of missing ngram file #16

Open BigBerny opened 7 years ago

BigBerny commented 7 years ago

In _getindices() line 190 there is a hack because 'qk' is missing in English 5gram. Unfortunately this hack is not language dependent. While 'qk' is available in other languages (Example) other indices are not (Example). That's why I get a AssertionError in line 126 when trying to access 'qy' 3grams in German.

So instead of this hack it would be better to check if each url is valid in _iter_googlestore(). For example by replacing line 126 by: if request.status_code != 200: continue

Thanks for your work by the way! :) Big_Berny

dimazest commented 7 years ago

Thanks for the report, I'll have a look into it.

BigBerny commented 7 years ago

Great. I just saw it's related to https://github.com/dimazest/google-ngram-downloader/issues/9.

dimazest commented 7 years ago

Yes, as you can see, I don't really have time :(. Pull requests are welcome. I would just make sure that in case of an error, a warning is shown on the screen.

tianhuil commented 5 years ago

HI, there is a PR that solves this from my fork pending but you can pip install it in the meantime

> pip install git+git://github.com/tianhuil/google-ngram-downloader.git@master