birchill / 10ten-ja-reader

A browser extension to translate Japanese by hovering over words.
https://addons.mozilla.org/firefox/addon/10ten-ja-reader/
GNU General Public License v3.0
595 stars 45 forks source link

Order entries based on ngrams #967

Open nicolasmaia opened 2 years ago

nicolasmaia commented 2 years ago

jmdict and jmnedict entries could be ranked based on Jim Breen's Google N-gram Corpus Counts.

birtles commented 2 years ago

That's a great idea. As far as I can tell, however, it costs quite a bit ($150 according to one site, 88,000円 according to another) and takes up 26Gb so it could be a bit of work to get that up and running in our current pipeline.

nicolasmaia commented 2 years ago

Would it be feasible to individually query Jim's site for all JMdict entries? Not sure how long a script like that would take to finish, though.