arnaudroger / SimpleFlatMapper

Fast and Easy mapping from database and csv to POJO. A java micro ORM, lightweight alternative to iBatis and Hibernate. Fast Csv Parser and Csv Mapper
http://simpleflatmapper.org
MIT License
437 stars 76 forks source link

Dangling quote mark and ampersand #654

Closed veyndan closed 5 years ago

veyndan commented 5 years ago

Is there a way to produce the desired output below from the CSV "hello,\"salt&pepper,world"? I've tried using CsvParser.DSL#disableUnescaping() which works when there are quote pairs but doesn't work with only a single quote.

Loving the library btw!

CsvParser
    .iterator("hello,\"salt&pepper,world")
    .forEach { it.forEach(::println) }

Output

hello
salt&pepper,world

Desired Output

hello
"salt&pepper
world
arnaudroger commented 5 years ago

I think I see

arnaudroger commented 5 years ago

so the problem is that the quote will make it consider it to be a protected field until the next quote. so currently there no way to disable the "quote" protection but you can change the quote char to 0, I don't think it's great and only use it if you are sure you wont get any other 0 char in the file. I'll see if a better way can be implemented, it's not real proper csv and you end up just looking for , \n

        CsvParser
                .dsl().quote((char)0)
                .iterator("hello,\"salt&pepper,world")
                .forEachRemaining(strings -> Stream.of(strings).forEach(System.out::println));
veyndan commented 5 years ago

Thanks for the proposed solution. Considering the problem more, I can see how it would be hard to deduce which form is correct, taking the input as CSV with two columns vs taking the input as CSV with three columns. I think what I need (though this is a very specific use case) is to tell SimpleFlatMapper that my CSV has three columns with one column potentially being incorrectly formatted (i.e. if there is a dangling quote, blame the column stated).

Something like this:

fun displayCsv(csv: String) {
    CsvParser
        .headerCount(3)
        .badlyFormattedHeader(1)
        .iterator("hello,\"salt&pepper,world")
        .forEach { it.forEach(::println) }
}
displayCsv("hello,\"salt&pepper,world")
// [output]
// hello
// "salt&pepper
// world

displayCsv("hello,\"salt&pepper,foo,world")
// [output]
// hello
// "salt&pepper,foo
// world
arnaudroger commented 5 years ago

it would be quite hard to implement in an efficient manner though. getting close to generic text parser :)

veyndan commented 5 years ago

True 😄

Thanks for the help anyway!