Closed tuner closed 4 years ago
There are so many ways a CSV can be "improper" that I'm not sure we can plan for all of them in this module.
When I'm faced with this type of file I usually filter out the comments in the row function. Here's an example that deals with two types of comments, at the bottom or at the top, including "continuation comments": https://observablehq.com/@fil/parse-csv-with-comments
Another fun CSV manipulation technique can be found in Mike’s arctic sea ice volume notebook, where multiple spaces are replaced by a comma before applying dsv.parse.
Okay, thanks for the comment and examples! Exploiting the row function is a nice technique.
Anyway, my intention is to provide the users (of my application) with options for handling some common issues with CSV files, but definitely not an exhaustive solution. There are some other CSV parsers with such options, but d3-dsv has superior performance. Perhaps I just maintain my own forked version. Thanks anyway!
Hi!
Currently, d3-dsv is very pure when it comes to RFC 4180. However, I need some nonstandard but practical features such as skipping comment lines or interpreting NAs as null (for example, in files created with R). Do you accept such pull requests?
So, this draft pull request adds an option object to dsvFormat with a single supported option:
comment
. Example: