adaltas / node-csv

Full featured CSV parser with simple api and tested against large datasets.
https://csv.js.org
MIT License
4.05k stars 267 forks source link

CSV parse - setting cast_date to true is converting unparsable strings to dates #342

Closed katherinea closed 2 years ago

katherinea commented 2 years ago

Describe the bug Using csv-parse: 5.0.4

When cast_date is set to true, if you parse a string that ends with a number, the string is cast to a date. According to the docs https://csv.js.org/parse/options/cast_date/, "The implementation relies on Date.parse. The CSV value is left untouched if the function returns NaN."

Date.parse("Test 2") => NaN, however the parsed field is returned as 2001-02-01T00:00:00.000Z

To Reproduce

with file test.csv

title
test 2

and implementation

  async parseCsv<T>(filePath: string): Promise<T[]> {
    const content: Buffer = await fs.readFile(filePath)
    const parsed: T[] = parse(content, {
      cast: true,
      cast_date: true,
      columns: true,
    })

    return parsed
  }

returned data is:

{
  title: 2001-02-01T00:00:00.000Z,
}
wdavidw commented 2 years ago

Hum, as a matter of fact, node -e 'console.info(Date.parse("Test 2"))' return 980982000000 (using Node.js v16.13.0). This comes as a surprise, but it explains the behavior. You can always write you own cast_date function.

katherinea commented 2 years ago

Hiya! Right sorry you're correct on that parsing. Strange behaviour as

Date.parse("Test 2")
=> 980985600000
Date.parse("Test 2 v 2")
=> NaN

I've tried to pass a function to cast_date but am getting an error message. With the following parsing options:

const parsed: T[] = parse(content, {
      cast: true,
      cast_date: (value: string, context: CastingContext) => {
        const date = Date.parse(value)
        // test function, not correctly handling non-date strings
        return new Date(date)
      },
      columns: true,
    })

I'm getting the error:

   {
      "type": "CsvError",
      "message": "Invalid option cast_date: cast_date must be true or a function, got undefined",
      "stack":
          Error: Invalid option cast_date: cast_date must be true or a function, got undefined
              at Parser.__normalizeOptions (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/csv-parse/dist/cjs/sync.cjs:206:13)
              at new Parser (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/csv-parse/dist/cjs/sync.cjs:153:10)
              at parse (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/csv-parse/dist/cjs/sync.cjs:1281:18)
              at _callee$ (/Users/katherine/RAcode/powertools/bulk-events-upload/src/services/csv-service/csv-service.ts:15:25)
              at tryCatch (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/regenerator-runtime/runtime.js:63:40)
              at Generator.invoke [as _invoke] (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/regenerator-runtime/runtime.js:294:22)
              at Generator.next (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/regenerator-runtime/runtime.js:119:21)
              at asyncGeneratorStep (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/@babel/runtime/helpers/asyncToGenerator.js:3:24)
              at _next (/Users/katherine/RAcode/powertools/bulk-events-upload/node_modules/@babel/runtime/helpers/asyncToGenerator.js:25:9)
      "code": "CSV_INVALID_OPTION_CAST_DATE"
    }

Also had a question about the types - cast_date takes a CastingDateFunction

CastingDateFunction = (value: string, context: CastingContext) => Date;

Shouldn't the return type be Date | string to allow for strings to be returned if they're not able to be parsed as a date? Thank you :)

wdavidw commented 2 years ago

Strange, option cast_date was defined in sync.t.ts but not implemented nowhere.