Closed mj-jadhav closed 6 years ago
I have wanted to do this, however, one of the primary features of cassandra-loader is the ability to log all the errors to a error file that can be examined later. To do that, I need to be able to get the original line/lines from the file to be able to properly log the error. That is not something that Univocity allows for. I did look into modifying the Univocity parser to support this, but stopped short. If you have an example of how you accomplished this, I'd be very interested to look.
Why not use Univocity Parser's splitter instead of readLine. https://github.com/al3xandru/cassandra-loader/blob/parser/src/main/java/com/datastax/loader/CqlDelimLoadTask.java#L191
A lot of parserSettings doesn't work because of this. For ex. following is one row in my CSV:
Instead of making it a single record your tool makes it two rows. Splitting lines outside of the parser itself not only breaks anything within quotes, it's also WAY slower (like 3-4 times slower) and generates twice the garbage.
Please fix this. I made a temporary hack for my use case b/c it has a lot of abstractions.