Appendium / flatpack

CSV/Tab Delimited and Fixed Length Parser and Writer
http://flatpack.sf.net
Apache License 2.0
57 stars 20 forks source link

Writing field which contains qualifier does not escape it #33

Closed martindiphoorn closed 6 years ago

martindiphoorn commented 6 years ago

In the situation that i have a double quote as field qualifier i would expect it to escape it when its use in a field. Example:

input = "test \" value";

// expected output when running through writer
// ,"test""value", 
//       ^^ notice the double double quotes 

This is the way it is described in RFC4180:

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example: "aaa","b""bb","ccc"

Im using below code:

       final StringWriter out =  new StringWriter();

        WriterOptions.getInstance().autoPrintHeader(true);
        final DelimiterWriterFactory factory = new DelimiterWriterFactory(',', '"');

        factory.addColumnTitles("col1", "col2", "col3");

        try (Writer writer = factory.createWriter(out)) {
            writer.addRecordEntry("col1", "-_+?@#\"test")
                .addRecordEntry("col2", "-_+?@#\"test")
                .addRecordEntry("col3", "-_+?@#\"test")
                    .nextRecord()
                    .flush();
        }

output:

col1,col2,col3
"-_+?@#"test","-_+?@#"test","-_+?@#"test"

I'm using version 4.0.1 from maven repositories.

benoitx commented 6 years ago

Hi Martin, Thanks for this.

Have you worked on a patch?

Thanks Benoit

martindiphoorn commented 6 years ago

Hi Benoit,

No i haven't.

I have checked the code and i think that the DelimiterWriter.java should be changed. If you want i can work on a patch.

Martin

martindiphoorn commented 6 years ago

Just created a pull request #34

benoitx commented 6 years ago

Excellent! I was going to have a look over the weekend but I also wanted to ensure that the Reader would cope with the file generated by the Writer.

Is it the case?

Thanks

Benoit


Important Notice This communication contains information that is considered confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you have received this communication in error please return it to the sender and delete the original

On 15 Jun 2018, at 13:43, Martin Diphoorn notifications@github.com wrote:

Just created a pull request #34

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

benoitx commented 6 years ago

Hi Martin,

A Snapshot is available at https://oss.sonatype.org/content/repositories/snapshots/net/sf/flatpack/flatpack/4.0.2-SNAPSHOT/

Let me know if you'd like to look at the Readers? And should we also ensure that the CR should be LF CR as per the RFC?

Thanks

martindiphoorn commented 6 years ago

Hi Benoit,Maybe you could create a seperate ticket for making it more rfc compatible. Probably next week i can look more into it. There is also a csvlint checker online.Greetings,MartinOp 15 jun. 2018 20:50 schreef Benoit Xhenseval notifications@github.com:Hi Martin, A Snapshot is available at https://oss.sonatype.org/content/repositories/snapshots/net/sf/flatpack/flatpack/4.0.2-SNAPSHOT/ Let me know if you'd like to look at the Readers? And should we also ensure that the CR should be LF CR as per the RFC? Thanks

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.

benoitx commented 6 years ago

Hi Martin,

I initially thought that the Readers were ok with double " (double Qualifiers) but adding a few unit tests uncovered some issues. They are now fixed.

Kindly have a go with the snapshot release (building happening now).

Let me know Thanks

Benoit

martindiphoorn commented 6 years ago

Great work!

I probably should open a new ticket for this, but i found still one edge case on parsing.

If you add this test case to the ParserUtilsSplitLineTest.java: check("\"a, b,\"\"\"c\"\"\", test", ',', '\"', new String[] { "a, b,\"c\"", "test" }); Then the outcome is wrong. \"\"\"c\"\"\" at the beginning should become \"c\". current outcome is: junit.framework.AssertionFailedError: expecting... expected:<a, b,"c"> but was:<a, b,""c">

Als there were still some unit tests failing. I fixed those. I don't have time now to check what's going wrong with the above case.

martindiphoorn commented 6 years ago

I just noticed i missed on unit test thats failing also. check("\"a, b,\"\"c\", \" test\"", ',', '\"', new String[] { "a, b,\"\"c", " test" }); This is an invalid line. Escaping quotes is only allowed when the qualifier is present.

martindiphoorn commented 6 years ago

Be cautious that if the unit test fails when running mvn test on the parent. The end line is still succeeds.

selectie_014