Since Google Scholar alerted me to the citation and I was curious, I checked this preprint of the Deep-EOS paper. It says there:
Further high-performers such asElephant(Evang et al., 2013) orCutter(Gra ̈en et al., 2018)follow a sequence labeling approach. However,they require a prior language-dependent tokeniza-tion of the input text.
At least I interpret this as saying that the input to Elephant is already tokenized text, on which sentence boundary detection is then performed. That is not true. Elephant performs tokenization and sentence boundary detection jointly. It is true that this scenario requires tokenized training data. However, Elephant could also be trained and used on data that is not tokenized and only sentence-segmented.
Since Google Scholar alerted me to the citation and I was curious, I checked this preprint of the Deep-EOS paper. It says there:
At least I interpret this as saying that the input to Elephant is already tokenized text, on which sentence boundary detection is then performed. That is not true. Elephant performs tokenization and sentence boundary detection jointly. It is true that this scenario requires tokenized training data. However, Elephant could also be trained and used on data that is not tokenized and only sentence-segmented.