Closed callahantiff closed 4 years ago
I've generally had success with this (ugly) method, to default reading input as UTF8:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
It's worth noting this is generally discouraged for reasons well explained here.
I've generally had success with this (ugly) method, to default reading input as UTF8:
import sys reload(sys) sys.setdefaultencoding('utf8')
It's worth noting this is generally discouraged for reasons well explained here.
Thanks for the suggestion! I think this really only applies to Python 2, but it's good to know about!
I believe I have a solid solution now (testing at scale as we speak) and will post it here once the test finishes.
OK, I have the solution, which will work for all unicode characters, including characters in foreign languages. The changes I made are described below for each changed script.
Dockerfile
RUN export PYTHONIOENCODING=utf-8
pkt_kg/metadata.py
output_knowledge_graph_metadata()
method to:
utf-8
encoding UnicodeEncodingError
Will close this error now, feel free to re-open if need be.
Problem: unicode errors occurring when writing out knowledge graph metadata locally --depending on the OS and Python version used.
Script:
metadata.py
Current Solution: encode/decode ontology term labels, definitions, and synonyms and explicitly ignore
UnicodeEncodeError
.Proposed Solution: Add functionality to better handle processing of
UnicodeEncodeError