HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
343 stars 67 forks source link

improper handling of newline when reading files #20

Closed jzell closed 9 years ago

jzell commented 9 years ago
The main() in class HeidelTimeStnadalone reads input with this loop:

    while ((line = fileReader.readLine()) != null)
       sb.append(System.getProperty("line.separator")+line);
                        }
This has the effect of adding a newline at the beginning and leaving the last line
unterminated.

This affects the tokenizer and POS tagger I am using, which gets an extra empty token
at the beginning and causing a disalignement in tokens.

It should be changed to:

    while ((line = fileReader.readLine()) != null)
       sb.append(line + System.getProperty("line.separator"));

Original issue reported on code.google.com by attardi on 2014-10-18 20:20:20

jzell commented 9 years ago
Hey and thanks for the report.

This is indeed some unfortunate code and I've gone ahead and fixed it to the extent
where it reads the input text verbatim from file (without mangling line terminations).

It'll find its way into the soon to be released HeidelTime 1.8. If you want to see
the changes before that, take a look at r59623843e127.

Original issue reported on code.google.com by zell@informatik.uni-heidelberg.de on 2014-10-19 16:56:48

jzell commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by zell@informatik.uni-heidelberg.de on 2014-12-08 14:21:49