Closed dimitri closed 10 years ago
The infinite loop is intentional (and allows continuing from as many errors as occur). The trouble you are running into is that nothing is resolving the error with the stream, so each continue call hits the same error.
Your handler can either advance the stream past the encoding error before continuing or take some other corrective or limiting action to terminate the loop.
The continue could work as-is in the case of a csv-parse-error (eg: where some quotes were not escaped) because it would correctly read a line / buffer from the stream, fail to parse it and repeat. It would either eventually exhaust the file or get past the errors and start parsing lines again.
More of that kind of error handling could be built in, but it is pretty ambiguous what steps should be taken to resolving an encoding error, so its not something I would be eager to build into the read-csv-row function. Useful contexts could be included as additional macros/function to wrap the parsing in.
See: https://github.com/AccelerationNet/cl-csv/blob/master/tests/csv.lisp#L313 for an example of the continue working as expected.
Hope this helps, feel free to reopen if I have misinterpreted what is requested Russ
Thanks for the heads up, I also agree that it's quite hard (impossible?) to decide on what to do after an encoding error. I've integrated the continue
restart properly in pgloader now.
While trying to benefit from the
'continue
restart now offered in cl-csv, I'm experiencing an infinite looping. I guess that the(next-iteration)
call is not enough to actually move the stream position after the faulty character.In my testing, I'm reading an utf-8 encoded file with some non-ascii chars while specifying an ascii
:external-format
when opening the stream.My first try at using a restart ever began this way (and provoked an infinite loop that I think is in your code):
Regards,