NCATSTranslator / Text-Mining-Provider-Roadmap

Roadmap and issue tracking for the NCATS Translator Text Mining Provider
MIT License
2 stars 2 forks source link

Higher order UTF-8 encoding not maintained throughout all article processing #77

Closed bill-baumgartner closed 3 years ago

bill-baumgartner commented 3 years ago

Describe the bug Higher order UTF-8 encoding is getting lost at some point during processing. Sentence used as evidence for extracted Biolink association contains ? where higher order UTF-8 characters should be.

To Reproduce See sentences used as evidence in the Biolink Association KG

Expected behavior ? should instead be higher order UTF-8 characters, e.g. TNF-? should instead be TNF-β