Closed mcnuttandrew closed 4 years ago
I think it's a good idea, and a possible implementation is given in https://github.com/d3/d3-dsv/pull/73
Note however that it would be a breaking change (people who already have some code running and this type of data expect it to continue working).
I like your solution, but I don't know if it's worth issuing a breaking change. I think just including some stuff in the documentation would probably get most people through the hurdle of identifying this error
I don't know… The thing is that, when the data has this shape (and when you don't control it), it's currently quite difficult to manipulate: you have to load it as text, then fiddle with the first line, then dsv.parse… I've had to do this literally last week. (Plus, we're going to issue a major version soon, so having a breaking change is not that problematic.)
Oh i didn't know a major version was coming! This seems like a great approach then
I ♻️ my code into a notebook (and added "empty names" as well) https://observablehq.com/@fil/csv-duplicate-names
Fixed in 8ab1ab86899338c93b3aa07c21f5a63e1c73f37d ; thank you!
There is a small ambiguity in the way that the tsvParse and csvParse address parsing files with columns that non-unique names. For instance if you have a tsv like
And you run that through tsvParse then you get
The problem of course being that the data from the first Example A column is blown away during the parse. I'm not sure what the right solution to this might be: maybe including some messaging in the docs that column names need to be unique? Or maybe appending an incrementing index to the duplicated columns ('Example A-1' or something). Having recently been bit by this, this is a real hair pulling issues to find/resolve, so any help that might be offered to other people in a similar situation would no doubt be welcomed.