AccelerationNet / cl-csv

A common lisp library providing easy csv reading and writing
Other
116 stars 22 forks source link

exporting cl-csv::with-csv-input-stream and -output-stream. #8

Closed tpapp closed 11 years ago

tpapp commented 11 years ago

Please consider exporting the symbols above.

Rationale: these macros are very useful for implementing functions that read/write CSV files by rows (but use a data structure other than list-of-lists), eg you already use them in get-data-table-from-csv.

ryepup commented 11 years ago

I don't see any problem with that, but I'm wondering if an API improvement would be a better solution. Those macros don't do much; they create a stream based on a variety of input, and close it safely, there's nothing particularly csv-related in there so I'd be nervous about other code using cl-csv as a utility library.

Data structures other than list-of-lists can be read from using the :row-fn or :map-fn keywords arguments to read-csv. For example:

(cl-csv:read-csv #P"file.csv"
                 ;; return a list of objects instead
                 :map-fn #'(lambda (row)
                             (make-instance 'foo
                                            :bar (nth 3 row)
                                            :baz (nth 5 row))))

You can also read results row-by-row, and do custom logic:

(let ((sum 0))
  (cl-csv:read-csv #P"file.csv"
                   ;; don't return anything from read-csv, just sum the 4th
                   ;; column
                   :row-fn #'(lambda (row)
                               (incf sum (parse-integer (nth 4 row)))))
  sum)

Or use the do-csv macro to do the same thing with less code:

(let ((sum 0))
  (cl-csv:do-csv (row #P"file.csv")
    (incf sum (parse-integer (nth 4 row))))
  sum)

Do those options meet your needs for reading? Do you think a similar approach would meet your needs for writing?

tpapp commented 11 years ago

Hi Ryan,

I understand that with-csv-input-stream is something like a utility function and it is not elegant to export it. I am doing something different to the first row (using it as headers). Calling read-csv-row myself leads to cleaner code, but I can of course work around the problem using closures and row-fn.

Thanks for the clarification, and please feel free to close the issue.

However, in the long run it would be great to write something like

(with-ensured-stream (stream input &rest keys-passed-to-open) ...)

that uses a similar mechanism, and submit it to eg Alexandria.

By the way, thanks for the library, I find it really useful.

Best,

Tamas

On Thu, Jan 03 2013, Ryan Davis notifications@github.com wrote:

I don't see any problem with that, but I'm wondering if an API improvement would be a better solution. Those macros don't do much; they create a stream based on a variety of input, and close it safely, there's nothing particularly csv-related in there so I'd be nervous about other code using cl-csv as a utility library.

Data structures other than list-of-lists can be read from using the :row-fn or :map-fn keywords arguments to read-csv. For example:

(cl-csv:read-csv #P"file.csv"
                 ;; return a list of objects instead
                 :map-fn #'(lambda (row)
                             (make-instance 'foo
                                            :bar (nth 3 row)
                                            :baz (nth 5 row))))

You can also read results row-by-row, and do custom logic:

(let ((sum 0))
  (cl-csv:read-csv #P"file.csv"
                   ;; don't return anything from read-csv, just sum the 4th
                   ;; column
                   :row-fn #'(lambda (row)
                               (incf sum (parse-integer (nth 4 row)))))
  sum)

Or use the do-csv macro to do the same thing with less code:

(let ((sum 0))
  (cl-csv:do-csv (row #P"file.csv")
    (incf sum (parse-integer (nth 4 row))))
  sum)

Do those options meet your needs for reading? Do you think a similar approach would meet your needs for writing?


Reply to this email directly or view it on GitHub: https://github.com/AccelerationNet/cl-csv/issues/8#issuecomment-11849165

ryepup commented 11 years ago

Tamas,

Thanks for the feedback. I got to thinking about how to make a more flexible API than read-csv, and just committed a patch that will make it way easier if you use the iterate library.

I implemented an in-csv iterate driver:

;; loop over a CSV using iterate
(iter (for (foo bar baz) in-csv #P"file.csv")
  (collect (make-instance 'object :foo foo :baz baz)))

;; supports all the options read-csv provides
(iter
  (for row in-csv #P"file.csv"
       ;; optional settings to configure the read
       SKIPPING-HEADER T SEPARATOR #\| QUOTE #\" ESCAPED-QUOTE "\"\"")
  (break "look at my columns" row))

;; do something special with the header
(iter (for row in-csv #P"file.csv")  
  (if-first-time
   (process-header row)
   (process-row row)))

Personally I like iterate a lot, and I think this approach will be a cleaner way to implement things like get-data-table-from-csv.

The relevant patch is in c2c85d188b7a44f94fa5293f9ca6d960587e64c9

tpapp commented 11 years ago

Hi Ryan,

Thanks for the suggestion. I used to like iterate very much, but then I ran into its limitations a couple of years ago (it uses a code walker which leads to problems with complex code) and I removed it from my libraries completely. Consequently, I don't think I will use your extension, so if you only did it on my account then feel free to revert the patch.

I reworked my code this morning and I realized that I can write very clean code using just parse-csv and closures, eg

(defun csv-to-data-columns (stream-or-string skip-first-row?)
  "Read a CSV file (or stream, or string), accumulate the values in DATA-COLUMNs, return a list of these.  Rows are checked to have the same number of elements.

When SKIP-FIRST-ROW?, the first row is read separately and returned as the second value(list of strings), otherwise it is considered data like all other rows."
  (let (data-columns
        (first-row skip-first-row?))
    (read-csv stream-or-string
              :row-fn (lambda (row)
                        (if data-columns
                            (assert (length= data-columns row))
                            (setf data-columns
                                  (loop repeat (length row) collect (data-column))))
                        (if first-row
                            (mapc #'data-column-add data-columns row)
                            (setf first-row row))))
    (values data-columns (unless skip-first-row? first-row))))

BTW, this is currently in data-omnivore ( https://github.com/tpapp/data-omnivore ) if you are interested in the context.

Again, the issue is solved for me, so please feel free to close it.

Best,

Tamas

On Thu, Jan 03 2013, Ryan Davis notifications@github.com wrote:

Tamas,

Thanks for the feedback. I got to thinking about how to make a more flexible API than read-csv, and just committed a patch that will make it way easier if you use the iterate library.

I implemented an in-csv iterate driver:

;; loop over a CSV using iterate
(iter (for (foo bar baz) in-csv #P"file.csv")
  (collect (make-instance 'object :foo foo :baz baz)))

;; supports all the options read-csv provides
(iter
  (for row in-csv #P"file.csv"
       ;; optional settings to configure the read
       SKIPPING-HEADER T SEPARATOR #\| QUOTE #\" ESCAPED-QUOTE "\"\"")
  (break "look at my columns" row))

;; do something special with the header
(iter (for row in-csv #P"file.csv")
  (if-first-time
   (process-header row)
   (process-row row)))

Personally I like iterate a lot, and I think this approach will be a cleaner way to implement things like get-data-table-from-csv.

The relevant patch is in c2c85d188b7a44f94fa5293f9ca6d960587e64c9


Reply to this email directly or view it on GitHub: https://github.com/AccelerationNet/cl-csv/issues/8#issuecomment-11853480

ryepup commented 11 years ago

Sounds good. Also consider the do-csv macro for slightly simpler syntax:

(defun csv-to-data-columns (stream-or-string skip-first-row?)
  "Read a CSV file (or stream, or string), accumulate the values in
DATA-COLUMNs, return a list of these. Rows are checked to have the same number
of elements. When SKIP-FIRST-ROW?, the first row is read separately and
returned as the second value(list of strings), otherwise it is considered data
like all other rows."
  (let (data-columns (first-row skip-first-row?))
    (do-csv (row stream-or-string)
      (if data-columns
          (assert (length= data-columns row))
          (setf data-columns (loop repeat (length row) collect (data-column))))
      (if first-row
          (mapc #'data-column-add data-columns row)
          (setf first-row row)))
    (values data-columns (unless skip-first-row?  first-row))))