dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Strip out BOM when reading text files #1584

Closed reckart closed 6 months ago

reckart commented 6 months ago

Is your feature request related to a problem? Please describe. When reading a text file using the TextReader, if the file contains a BOM at the start, that BOM ends up in the CAS. Typically, this is not what we want.

Describe the solution you'd like By default remove the BOM, but allow to keep it using a parameter.