Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
MIT License
661
stars
90
forks
source link
The TextPreProcessor class only supports segmenting text with hastags. Required support for normal text segmenter. #15
The TextPreProcessor class only supports word segmenting if hashtag symbol is there otherwise it fails.
Example:-
# With hashtag it works
s = " question kind infidelity passed sweety not feel sweet #savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
'question kind infidelity passed sweety not feel sweet <hashtag> saving your marriage before it starts </hashtag>'
#without hashtag it fails
s = " question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"
print(" ".join(text_processor.pre_process_doc(s)))
" question kind infidelity passed sweety not feel sweet savingyourmarriagebeforeitstarts"
The TextPreProcessor class configuration is similar to what is defined in README.md file.
Kindly review it and if you find that correct, I can send a pull request.
The TextPreProcessor class only supports word segmenting if hashtag symbol is there otherwise it fails.
Example:-
The TextPreProcessor class configuration is similar to what is defined in README.md file.
Kindly review it and if you find that correct, I can send a pull request.