JULIELab / JeSemE

Jena Semantic Explorer
http://jeseme.org
MIT License
11 stars 3 forks source link

update corpus2ngrams #16

Open hellrich opened 6 years ago

hellrich commented 6 years ago

uses fixed seed probabilistic downsampling, should be weighted

hellrich commented 6 years ago

Processing used google_books_parts2counts, not a good idea to have two similar tools. Change applies here too https://github.com/hellrich/hyperwords/blob/master/hyperwords/google_books_parts2counts.py