Data-Liberation-Front / csvlint.io

Check that your CSV files are valid
http://csvlint.io
MIT License
73 stars 11 forks source link

Support for UTF-8 special Characters #521

Open Jamie-Atkinson opened 4 years ago

Jamie-Atkinson commented 4 years ago

Expected Behaviour

When uploading data for checking any rows that look like this:

9107,McKee’s,11 Fairhill,,,Maghera,BT46 5AY,Northern Ireland,Processing Plant (Meat) Cutting Plant (Red) Mince Meat Establishment Meat Preparation Establishment,,CP (Cutting Plant),,,,MM (Mince Meat Establishment) MP (Meat Preparation Establishment),PP (Processing Plant),,,,,,,,,,,,Section: VI (PP) Section: I (CP) Section: V (MM) Section: V (MP),,Bovine Ovine Porcine,,,,Yes,,,Yes,Yes,,,,,,,,,Yes,,,,,,,,Yes,Yes,,,,Yes,,,,,,,Food Standards Agency,,,,

I would expect them to return as sent, excluding any potential formatting issues and the "".

Current Behaviour (for problems)

Currently that row from a dataset returns:

"9107","McKee???s","11 Fairhill","","","Maghera","BT46 5AY","Northern Ireland","Processing Plant (Meat) Cutting Plant (Red) Mince Meat Establishment Meat Preparation Establishment","","CP (Cutting Plant)","","","","MM (Mince Meat Establishment) MP (Meat Preparation Establishment)","PP (Processing Plant)","","","","","","","","","","","","Section: VI (PP) Section: I (CP) Section: V (MM) Section: V (MP)","","Bovine Ovine Porcine","","","","Yes","","","Yes","Yes","","","","","","","","","Yes","","","","","","","","Yes","Yes","","","","Yes","","","","","","","Food Standards Agency","","","",""

Please note that McKee’s has turned into McKee???s. I believe this is due to a lack of UTF-8 support within the CSVlint application.

Steps to Reproduce (for problems)

Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. Include code to reproduce, if relevant

  1. download github.txt and convert the txt back to csv (github would not upload csv)
  2. submit the data to the csvlint app
  3. download the standardised version

Your Environment

google chrome version: Version 81.0.4044.129 (Official Build) (64-bit) Windows 10 laptop atom for opening and inspecting files

Is it possible to look at getting UTF-8 support added to csvlint?

Many thanks

Jamie

Jamie-Atkinson commented 4 years ago

this may be connected to https://github.com/Data-Liberation-Front/csvlint.io/issues/267

Floppy commented 4 years ago

thanks @Jamie-Atkinson for the detailed report, we're just getting this application back into regular maintenance, so hopefully we'll be able to look at this before too long.

Jamie-Atkinson commented 4 years ago

Super thanks Floppy/ All