Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
it happen in the default pipeline of tokenizer here. You can pass a custom pipeline to the tokenizer and removing "EMOJI" from that pipeline removes this problem.
How to make this as one token and not separate it. Where is this tokenizing happening?