dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
194 stars 68 forks source link

Option to replace illegal characters in XMI files #1565

Closed reckart closed 1 year ago

reckart commented 1 year ago

Is your feature request related to a problem? Please describe. When serializing documents that contain characters that are illegal XML characters with the XmiWriter, it fails.

Describe the solution you'd like In many cases, the illegal characters are there unintentionally and once the data has already been processed, it is hard to patch them out in UIMA. The easiest way to write the data out and continue using it would be to replace the illegal characters on output.