Anton87 / uimafit

Automatically exported from code.google.com/p/uimafit
0 stars 0 forks source link

bug in TokenBuilder related to windows newlines #44

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
TokenBuilder replaces every \r\n with \\n instead of \n in the input text.  
This is a bug.  Also, I think it would be nice if there were multiple newlines 
in a row that an empty sentence is not created.  So, I am going to replace the 
line:

tokensString = tokensString.replaceAll("\r\n", "\\n");

with: 

tokensString = tokensString.replaceAll("\\s*\n\\s*", "\n");

this will fix the bug and produce better sentences.  

Original issue reported on code.google.com by pvogren@gmail.com on 27 Dec 2010 at 7:38

GoogleCodeExporter commented 8 years ago

Original comment by pvogren@gmail.com on 27 Dec 2010 at 7:42