Tux / Text-CSV_XS

perl5 module for composition and decomposition of comma-separated values
17 stars 20 forks source link

Add support for configurable NULL encoding #14

Closed dankasak closed 6 years ago

dankasak commented 6 years ago

Current NULL encoding options are limited. It works for some cases - where upstream can handle what we produce. Other cases - eg MySQL 'load data infile' - is unable to correctly identify NULLs using our encoding method ( eg ,, ). The docs here: http://search.cpan.org/~hmbrand/Text-CSV_XS-1.35/CSV_XS.pm#csv ... suggest you can produce output that databases can parse by doing:

while (my $row = $sth->fetch) {
  $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
  }

... but this is absolutely not the case. Given the data:

[ "blah", undef, 3 ]

... the required output for importing into MySQL or other DBs would be:

"blah",\N,3

... but the above hack instead gives us:

"blah","\\N",3

There are 2 problems with this: 1) Text::CSV_XS is escaping the \N, giving us \\N. DBs won't parse this correctly. 2) Text::CSV_XS is quoting the \\N. DBs won't parse this correctly either.

What we really need is a way to pass in any string sequence that can be used to encode a NULL value. Additionally, this string sequence should not be quoted.

Tux commented 6 years ago

Sorry, but you did not mention one critical issue that causes the behavior you see, as in the default setup, it works exactly as documented:

$ perl -MText::CSV_XS -e'Text::CSV_XS->new->say(*STDOUT,[map{$_//"\\N"}"blah",undef,3])'
blah,\N,3

But I am sure you have setup your instance with { escape => "\\" }, which causes the need to escape the escape, and fields that have escapes in them are automatically quoted:

$ perl -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\"})->say(*STDOUT,[map{$_//"\\N"}"blah",undef,3])'
blah,"\\N",3

I'll see if there can be a more explicit attribute to achieve this, as callbacks are not an option to make this combination work.

Tux commented 6 years ago

Try a pull/clone from here. I added the undef_str attribute:

$ perl -Mblib -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\",undef_str=>"\\N"})->say(*STDOUT,["blah",undef,3])'perl -Mblib -MText::CSV_XS -e'Text::CSV_XS->new({escape=>"\\",undef_str=>"\\N"})->say(*STDOUT,["blah",undef,3])'
blah,\N,3

See https://github.com/Tux/Text-CSV_XS/blob/master/doc/CSV_XS.md#undef_str

dankasak commented 6 years ago

Wow, that's great! Thankyou so much for the fast response and patch.

Tux commented 6 years ago

Feedback would be more than welcome BTW. I saw it failed tests on 5.18.x and below, so I cannot release yet. (and I just pushed a patch that allows UTF-8 values for undef_str, where (U+002205) springs to mind as useful :)

Tux commented 6 years ago

@dankasak would you be so kind to pull/clone again? I changed a lot of code to make it work on 5.18.x and below. If my tests do not cover all issues, your code might. Any other feedback for now?

dankasak commented 6 years ago

I've pulled and re-tested, using a couple of options for supporting different databases. Looks good to me :)

Tux commented 6 years ago

Thanks for the feedback. I'll start the big test-to-release process