One of the two ca 500000-records sets I downloaded on 3 August 2017 is not UTF-8, or not entirely UTF-8, but instead is CP-1252 (Windows-1252) encoded. This issue affects characters such as Latin small e with acute accent (U+00E9) and generates large numbers of replacement characters (U+FFFD) in a UTF-8 environment. (See also "Darwin Core export: character encoding issues")
One of the two ca 500000-records sets I downloaded on 3 August 2017 is not UTF-8, or not entirely UTF-8, but instead is CP-1252 (Windows-1252) encoded. This issue affects characters such as Latin small e with acute accent (U+00E9) and generates large numbers of replacement characters (U+FFFD) in a UTF-8 environment. (See also "Darwin Core export: character encoding issues")