Class FileReader documentation says "The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream."
I have no idea what the default encoding is, but the results I'm seeing from the taxon_info service are gibberish (e.g. the synonym in OTT id 3717384). I bet that if TaxonomyLoaderOTT.java were changed to follow the above advice, the results would be better. (It would be necessary to check that the JSON is being written in UTF-8 as well.)
(This problem may affect treemachine as well, but it doesn't deal with synonyms, which is where most of the fancy characters lie.)
Class FileReader documentation says "The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream."
I have no idea what the default encoding is, but the results I'm seeing from the taxon_info service are gibberish (e.g. the synonym in OTT id 3717384). I bet that if TaxonomyLoaderOTT.java were changed to follow the above advice, the results would be better. (It would be necessary to check that the JSON is being written in UTF-8 as well.)
(This problem may affect treemachine as well, but it doesn't deal with synonyms, which is where most of the fancy characters lie.)