codeforamerica / trailsyserver

API and admin UI server for Trailsy data
7 stars 15 forks source link

Windows encoding breaks CSV import #54

Closed danavery closed 10 years ago

danavery commented 10 years ago

The latest traildata CSV from CVNP is encoded in WINDOWS-1252, and Ruby 2.0.0 CSV chokes on it with an "Invalid byte sequence in UTF-8 error". Need to figure out if CSV can determine on the fly between WINDOWS-1252 and UTF-8, or if we need to detect that ahead of time.

danavery commented 10 years ago

OK, so it looks like the file isn't quite WINDOWS-1252, just has some characters from it. Even Charlock Holmes says it's ISO-8859-1.

Testing for valid encoding, and if we don't have it, we'll assume WINDOWS-1252 and use String#encode