dbmdz / deep-eos

General-Purpose Neural Networks for Sentence Boundary Detection
GNU Affero General Public License v3.0
74 stars 7 forks source link

Error in Related Work section of paper #4

Open texttheater opened 4 years ago

texttheater commented 4 years ago

Since Google Scholar alerted me to the citation and I was curious, I checked this preprint of the Deep-EOS paper. It says there:

Further high-performers such asElephant(Evang et al., 2013) orCutter(Gra ̈en et al., 2018)follow a sequence labeling approach. However,they require a prior language-dependent tokeniza-tion of the input text.

At least I interpret this as saying that the input to Elephant is already tokenized text, on which sentence boundary detection is then performed. That is not true. Elephant performs tokenization and sentence boundary detection jointly. It is true that this scenario requires tokenized training data. However, Elephant could also be trained and used on data that is not tokenized and only sentence-segmented.

stefan-it commented 4 years ago

Hi @texttheater thanks for raising that issue! I added an errata section in the main readme (incl. a link to that issue here) :)