OpenTreeOfLife / taxomachine

taxonomy graphdb
Other
7 stars 4 forks source link

Taxonomy files are UTF-8 #146

Open jar398 opened 7 years ago

jar398 commented 7 years ago

Class FileReader documentation says "The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream."

I have no idea what the default encoding is, but the results I'm seeing from the taxon_info service are gibberish (e.g. the synonym in OTT id 3717384). I bet that if TaxonomyLoaderOTT.java were changed to follow the above advice, the results would be better. (It would be necessary to check that the JSON is being written in UTF-8 as well.)

(This problem may affect treemachine as well, but it doesn't deal with synonyms, which is where most of the fancy characters lie.)

jar398 commented 7 years ago

Fixed on March 4; need to verify that it really works, then close the issue.