cbaziotis / ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
MIT License
661 stars 90 forks source link

Can Ekphrasis be used in other languages? #25

Open shuningge opened 4 years ago

shuningge commented 4 years ago

My dataset is in Italian. I am wondering if Ekphrasis can also be used in Italian. Or it's only for English?

manueltonneau commented 4 years ago

Yes it can! You basically just have to use the generate_stats.py as indicated in the README to train word statistics for your corpus and then use your corpus name when instantiating the different classes.