cbaziotis / ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
MIT License
660 stars 91 forks source link

how to get the word statistics? #31

Closed UGUESS-lzx closed 2 years ago

UGUESS-lzx commented 2 years ago

Excuse me, where can I get the word statistics?😢 In other Issue, they said image but I can't find the word statistics in such url... Thanks in advance.!

ArlanCooper commented 2 years ago

i can't download the file by the url in the code: https://www.dropbox.com/s/a84otqrg6u1c5je/stats.zip?dl=1 this url is inaccessible. where can i download this file? thanks.

cbaziotis commented 2 years ago

Initially, I used my personal dropbox account to host the file as only some friends and I were using the library. It turns out that dropbox has suspended my public links for generating excessive traffic...

I moved the data to another server and updated the public link for the stats.zip file. Please, ppdate the package and try again.

build from source

pip install git+git://github.com/cbaziotis/ekphrasis.git

or install from pypi

pip install ekphrasis -U

FYI the link is https://data.statmt.org/cbaziotis/projects/ekphrasis/stats.zip

Let me know if it works now.