frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 109 forks source link

Reconsider data types #867

Open nichtich opened 6 months ago

nichtich commented 6 months ago

The Table Schema tag lists several issues related to data types. These might better be discussed together. Subgroups may be:

date and times

numeric types

units

compound types

peterdesmet commented 6 months ago

I agree it would be easier if these were discussed as subgroups. What would be the best forum for this? Call, discussions, grouping issues, etc.?

khusmann commented 5 months ago

Can we also include categorical / ordinal types as its own group? Right now they're implemented as constraints / extensions on other types, but I think deserve first-class discussion. (relevant issue: #844)

roll commented 4 months ago

In my opinion, the most pressing issue for v2 is number type ambiguity. With a simple example like this:

schema.json

fields:
  -name: value
   type: number

data.csv

value
1.1
2.2
3.3

We get into an undefined state when we need to export data into e.g. SQL because float and numeric (decimal) types are completely different both physically and logically. Table Schema doesn't provide any information regarding this similar to JSON (and JSON really suffers from it when it comes to e.g. monetary data)

nichtich commented 4 months ago

See also this comparison of SQL data types.