Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
Our current approach in Table Dialect is to mix-in new properties for new formats. This means properties like delimter can be set along with sheetName, which doesn't make sense. It's going to get more unwieldy & potentially contradictory the more features of different formats we support.
To separate concerns & make validation easier / better defined, I would suggest we add a dialect.type field that enables us to separate delimited, sql, workbook, structured. Then, like field types, this would form a discriminated union and dictate which properties would also be present in the dialect:
When "type" = "delimited", we could set delimiter, lineTerminator, etc.
When "type" = "sql", we could set table, etc.
When "type" = "workbook", we could set sheetNumber, sheetName, etc.
New formats could be added via new dialect types, in the same way new fields are added via new field types.
This would really help with declarative parsing systems (like pydantic, zod, etc.) by making illegal states unrepresentable.
It would also be 100% backwards compatible if we made "type" = "delimited" the default, when type was unset.
Our current approach in Table Dialect is to mix-in new properties for new formats. This means properties like
delimter
can be set along withsheetName
, which doesn't make sense. It's going to get more unwieldy & potentially contradictory the more features of different formats we support.To separate concerns & make validation easier / better defined, I would suggest we add a
dialect.type
field that enables us to separatedelimited
,sql
,workbook
,structured
. Then, like field types, this would form a discriminated union and dictate which properties would also be present in the dialect:When
"type" = "delimited"
, we could setdelimiter
,lineTerminator
, etc.When
"type" = "sql"
, we could settable
, etc.When
"type" = "workbook"
, we could setsheetNumber
,sheetName
, etc.New formats could be added via new dialect types, in the same way new fields are added via new field types.
This would really help with declarative parsing systems (like pydantic, zod, etc.) by making illegal states unrepresentable.
It would also be 100% backwards compatible if we made
"type" = "delimited"
the default, whentype
was unset.