AccelerationNet / cl-csv

A common lisp library providing easy csv reading and writing
Other
116 stars 22 forks source link

continue restart-case infinite looping #15

Closed dimitri closed 10 years ago

dimitri commented 10 years ago

While trying to benefit from the 'continue restart now offered in cl-csv, I'm experiencing an infinite looping. I guess that the (next-iteration) call is not enough to actually move the stream position after the faulty character.

In my testing, I'm reading an utf-8 encoded file with some non-ascii chars while specifying an ascii :external-format when opening the stream.

My first try at using a restart ever began this way (and provoked an infinite loop that I think is in your code):

(handler-bind ((condition
                #'(lambda (c)
                    (log-message :error "PLOP: ~a" c)
                    (invoke-restart 'continue))))
  (handler-case
      (cl-csv:read-csv input
                       :row-fn (compile nil reformat-then-process)
                       :separator (csv-separator csv)
                       :quote (csv-quote csv)
                       :escape (csv-escape csv)
                       :unquoted-empty-string-is-nil t
                       :quoted-empty-string-is-nil nil
                       :trim-outer-whitespace (csv-trim-blanks csv)
                       :newline (csv-newline csv))
    ((or cl-csv:csv-parse-error) (condition)
      (progn
        (log-message :error "~a" condition)
        (pgstate-setf *state* (target csv) :errs -1)))))

Regards,

bobbysmith007 commented 10 years ago

The infinite loop is intentional (and allows continuing from as many errors as occur). The trouble you are running into is that nothing is resolving the error with the stream, so each continue call hits the same error.

Your handler can either advance the stream past the encoding error before continuing or take some other corrective or limiting action to terminate the loop.

The continue could work as-is in the case of a csv-parse-error (eg: where some quotes were not escaped) because it would correctly read a line / buffer from the stream, fail to parse it and repeat. It would either eventually exhaust the file or get past the errors and start parsing lines again.

More of that kind of error handling could be built in, but it is pretty ambiguous what steps should be taken to resolving an encoding error, so its not something I would be eager to build into the read-csv-row function. Useful contexts could be included as additional macros/function to wrap the parsing in.

See: https://github.com/AccelerationNet/cl-csv/blob/master/tests/csv.lisp#L313 for an example of the continue working as expected.

Hope this helps, feel free to reopen if I have misinterpreted what is requested Russ

dimitri commented 10 years ago

Thanks for the heads up, I also agree that it's quite hard (impossible?) to decide on what to do after an encoding error. I've integrated the continue restart properly in pgloader now.