Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Yes it can! You basically just have to use the generate_stats.py as indicated in the README to train word statistics for your corpus and then use your corpus name when instantiating the different classes.
My dataset is in Italian. I am wondering if Ekphrasis can also be used in Italian. Or it's only for English?