AtlasOfLivingAustralia / ala-downloads

Data downloads
https://downloads.ala.org.au
1 stars 4 forks source link

Darwin Core export: incorrect character encoding for dataset #18

Open Mesibov opened 6 years ago

Mesibov commented 6 years ago

One of the two ca 500000-records sets I downloaded on 3 August 2017 is not UTF-8, or not entirely UTF-8, but instead is CP-1252 (Windows-1252) encoded. This issue affects characters such as Latin small e with acute accent (U+00E9) and generates large numbers of replacement characters (U+FFFD) in a UTF-8 environment. (See also "Darwin Core export: character encoding issues")