datagovuk / dgu2

Experimental publishing prototype
MIT License
0 stars 1 forks source link

Handling of weirdly encoded files #125

Open rossjones opened 8 years ago

rossjones commented 8 years ago

This file is Non-ISO extended-ASCII according to file There's nothing in the headers to give a clue as to the encoding.


curl -I https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/219910/moj-senior-dataset.csv

HTTP/1.1 200 OK Server: nginx Content-Type: text/csv Cache-Control: max-age=14400, public Content-Disposition: inline; filename="moj-senior-dataset.csv" Etag: "51e53221-11f02" Last-Modified: Tue, 16 Jul 2013 11:44:33 GMT Link: https://www.gov.uk/government/publications/senior-civil-service-salaries-31-march-2011; rel="up" Strict-Transport-Security: max-age=31536000 Via: 1.1 router X-Frame-Options: SAMEORIGIN Via: 1.1 varnish Fastly-Backend-Name: origin Content-Length: 73474 Accept-Ranges: bytes Date: Tue, 27 Sep 2016 16:18:10 GMT Via: 1.1 varnish Age: 843 Connection: keep-alive X-Served-By: cache-lhr6334-LHR X-Cache: MISS, HIT X-Cache-Hits: 1 X-Timer: S1474993090.364663,VS0,VE0


How do we handle files where we have no way of guessing the encoding before hand?