GetDKAN / dkan

DKAN Open Data Portal
https://dkan.readthedocs.io/en/latest/index.html
GNU General Public License v2.0
365 stars 170 forks source link

Auto-detect the character encoding set of datasets on import #3667

Open clayliddell opened 2 years ago

clayliddell commented 2 years ago

Currently, DKAN only supports UTF-8 datasets. MySQL allows defining character set on a table by table basis, so we should be able to support datasets of any character encoding set, we'll just need to devise some mechanism for detecting a dataset file's encoding.

User Story

As a DKAN user, I want to be able to upload and import non-UTF-8 encoded CSV files.

Acceptance Criteria

When a non-UTF-8 encoded CSV file is uploaded, it is properly imported and displayed on a dataset.

dafeder commented 4 months ago

Just noting this is non-trivial to execute well. I'm going to keep this open because I think it is a need that should be addressed somehow. Optional integration with qsv could be an approach.