adaltas / node-csv

Full featured CSV parser with simple api and tested against large datasets.
https://csv.js.org
MIT License
4.05k stars 267 forks source link

Parse dot notation columns into nested objects #414

Open jivung opened 9 months ago

jivung commented 9 months ago

I want to be able to write my column names with dot notation, like this:

name.first,name.last,age
john,doe,25

The parsing should result in this:

[
  {
     name: {
       first: 'john',
       last: 'doe'
     },
     age: 25
   }
]

I dont know if it's already possible and I just dont understand how to do it.

wdavidw commented 9 months ago

The closest thing is the column option but it doesnt support (yet?) nested column.

datu925 commented 9 months ago

+1 to this, this would be a great feature.

In the archived repo, this was requested here: https://github.com/adaltas/node-csv-parse/issues/76.

And I came here to make a similar suggestion. I'm rolling my own simple version right now but if this was part of the library, even better.

I think the argument for it is that csv-stringify offers this nested properties behavior when outputting, so being able to read that format when parsing (enabling a "round trip") is desirable.

wdavidw commented 9 months ago

Yes, I remember a few years back porting the underscore code into the library to support this feature without including an external dependency. A little buzy in the next copple of weeks between work and then holidays but I'll try to find the time.

datu925 commented 6 months ago

In case it helps as a starting point, my implementation was something like the below (I've edited it a bit to remove project-specific things, so it may no longer compile). NestedKeyVal was a custom type that is supposed to represent a possibly nested string-keyed object.

export function flatToNested(
  rows: any[],
)  {
  const outputs: NestedKeyVal[] = [];
  for (const row of rows) {
    const output: NestedKeyVal = {};
    for (const columnName in row) {
      let val = row[columnName];

      const chunks = columnName.split('.');
      // dest is basically a pointer to a particular object or subobject in
      // the output structure. It starts at the root output for each key but
      // advances into sub-objects to allow nesting.
      let dest = output;
      if (chunks.length === 1) {
        dest[chunks[0]] = val;
      } else {
        for (const chunk of chunks.slice(0, -1)) {
          if (!(chunk in dest)) {
            const subobj: NestedKeyVal = {};
            dest[chunk] = subobj;
          }
          dest = dest[chunk];
        }
        dest[chunks.at(-1)!] = val;
      }
    }
    outputs.push(output);
  }
  return outputs;
}