BurntSushi / xsv

A fast CSV command line toolkit written in Rust.
The Unlicense
10.23k stars 317 forks source link

Escaping unescaped quotes in quoted strings modifies the data #326

Closed bcalco closed 10 months ago

bcalco commented 10 months ago

Running the 'input' command on a CSV with malformed quoted strings fixes them enough that they are able to be processed but modifies the data inappropriately.

For example, the following problematic column value in one of our test files:

"Choices "contact us" email address"

Note: the two spaces between "Choices" and "contact us" are in the original data.

Gets changed to:

"Choices contact us"" email address"""

But it should be:

"Choices ""contact us"" email address"

The command being run is:

xsv input <malformed-file> -o <target-file>

This is a very consistent error that, although allowing processing of the data (i.e. conformant parsers now accept the files), it subtly (and unacceptably) changes it in the process.

BurntSushi commented 10 months ago

I understand the request, but it's not reasonable to support. If your data is malformed, then that's the problem you should fix. It being malformed makes it impossible for xsv to choose a correct interpretation in every case and exposing options to control how different classes of malformed data are interpreted is not something I'm interested in doing.

although allowing processing of the data

This is the goal that xsv has.

bcalco commented 10 months ago

The CSV parser author told me the same thing. lol.

The issue is, I don't own the data - I'm consuming third party data. So I have to find a way to scrub it. But I understand your position.

If I run into a case where the changes xsv made render it more broken, or introduced a new error, then I'll file a new ticket.

Thanks for the prompt reply, anyway!