frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 107 forks source link

Separate concerns by adding a `dialect.type` field #921

Open khusmann opened 2 months ago

khusmann commented 2 months ago

Our current approach in Table Dialect is to mix-in new properties for new formats. This means properties like delimter can be set along with sheetName, which doesn't make sense. It's going to get more unwieldy & potentially contradictory the more features of different formats we support.

To separate concerns & make validation easier / better defined, I would suggest we add a dialect.type field that enables us to separate delimited, sql, workbook, structured. Then, like field types, this would form a discriminated union and dictate which properties would also be present in the dialect:

When "type" = "delimited", we could set delimiter, lineTerminator, etc.

When "type" = "sql", we could set table, etc.

When "type" = "workbook", we could set sheetNumber, sheetName, etc.

New formats could be added via new dialect types, in the same way new fields are added via new field types.

This would really help with declarative parsing systems (like pydantic, zod, etc.) by making illegal states unrepresentable.

It would also be 100% backwards compatible if we made "type" = "delimited" the default, when type was unset.