forTEXT / catma

Computer Assisted Text Markup and Analysis
https://www.catma.de
GNU General Public License v3.0
88 stars 8 forks source link

Strip BOM that may appear after opening tag in an XML document #239

Open maltem-za opened 3 years ago

maltem-za commented 3 years ago

In certain scenarios, users may upload an XML file that contains a BOM after the initial opening tag. This causes an offset issue.

This can happen if, for example, a user runs the Stanford NER tool on a .txt file with a BOM, and then manually adds an opening and closing XML tag at the start and end of the resulting output file to make it valid XML.