frictionlessdata / datapackage-js

A JavaScript library for working with Data Package.
http://frictionlessdata.io/
MIT License
42 stars 15 forks source link

Bug: iso8859-1 encoded .csv resources create a TableSchemaError #120

Open scammo opened 1 month ago

scammo commented 1 month ago

Overview

I'm trying to use Open Data from the state of Schleswig-Holstein. This is my first time using frictionless, so please bear that in mind. The resource is: https://opendata.schleswig-holstein.de/data/frictionless/badegewaesser.json

This includes resources which are encoded in: iso8859-1

"path": "https://efi2.schleswig-holstein.de/bg/opendata/v_badegewaesser_odata.csv",
 "encoding": "iso8859-1",
"name": "badegewasser-stammdaten",
"profile": "tabular-data-resource",
"format": "csv",

If I try:

const resource = await datapackage.Package.load('https://opendata.schleswig-holstein.de/data/frictionless/badegewaesser.json', '', false)
await resource.getResource('badegewasser-messungen').read({ keyed: true})

I get the following error:

TableSchemaError: There are 3 type and format mismatch errors (see 'error.errors')
    at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:176:15)
    at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
    at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
    at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
    at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
    at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
    at Parser.ondata (node:internal/streams/readable:1007:22)
    at Parser.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:559:12) {
  _errors: [
    TableSchemaError: The value "0178_1" in column "MESSSTELLENID" is not type "integer" and format "default"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:89:17)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 3,
      rowNumber: 1,
      errors: []
    },
    TableSchemaError: The value "beh�rdliche �berwachung" does not conform to the "enum" constraint for column "UEBERWASCHUNGSARTTEXT"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:110:21)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 5,
      rowNumber: 1,
      errors: []
    },
    TableSchemaError: The value "K�stengew�sser" does not conform to the "enum" constraint for column "GEWAESSERKATEGORIE"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:110:21)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 6,
      rowNumber: 1,
      errors: []
    }
  ],
  rowNumber: 1,
  errors: [
    TableSchemaError: The value "0178_1" in column "MESSSTELLENID" is not type "integer" and format "default"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:89:17)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 3,
      rowNumber: 1,
      errors: []
    },
    TableSchemaError: The value "beh�rdliche �berwachung" does not conform to the "enum" constraint for column "UEBERWASCHUNGSARTTEXT"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:110:21)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 5,
      rowNumber: 1,
      errors: []
    },
    TableSchemaError: The value "K�stengew�sser" does not conform to the "enum" constraint for column "GEWAESSERKATEGORIE"
        at Field.castValue (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/field.js:110:21)
        at Schema.castRow (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/schema.js:150:31)
        at DestroyableTransform._transform (/home/scammo/projects/badewasser_frictionless/node_modules/tableschema/lib/table.js:342:44)
        at Transform._read (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:184:10)
        at Transform._write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_transform.js:172:83)
        at doWrite (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:428:64)
        at writeOrBuffer (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:417:5)
        at Writable.write (/home/scammo/projects/badewasser_frictionless/node_modules/readable-stream/lib/_stream_writable.js:334:11)
        at Parser.ondata (node:internal/streams/readable:1007:22)
        at Parser.emit (node:events:519:28) {
      _errors: [],
      columnNumber: 6,
      rowNumber: 1,
      errors: []
    }
  ]
}

It seems to me, that the encoding of .csv is not done correctly. I tried .rawRead() and also saw the encoding errors for the e.G. ü Umlaute. If I use the .rawRead()` method with correct encoding, the .csv seems to be parsed correctly e.G.:

const stammdatenBuffer = await resource.getResource('badegewasser-infrastruktur').rawRead({ keyed: true })
const decoder = new TextDecoder('iso8859-1');
const stammdaten = decoder.decode(stammdatenBuffer);

I tried it on Node v21.6.2 on Ubuntu. In other frictionless packages, this .json seems to be valide.

Thanks for your work!


Please preserve this line to notify @aivuk (lead of this repository)