frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 109 forks source link

Units and scales (and currency) in Table Schema #216

Open rufuspollock opened 8 years ago

rufuspollock commented 8 years ago

STATUS:


Excellent discussion with @dr-shorthair today led me to consider importance of units and scales (and currency) in JSON Table Schema.

Suggest we could specific at MAY level:

unit: simple-string-descriptor e.g. m/s
unitSemantic: pointer-to-a-url-describing that unit - could be RDF uri
currency:      # could be part of units but think probably better separate
factor: a scaling factor (e.g. 1000 would mean to scale by 1000

References

scls19fr commented 8 years ago

Pint is a great Python library for units. http://pint.readthedocs.org/

danfowler commented 8 years ago

Thinking about this through the lens of a Fiscal Data Package profile, a mapping object has been used to give semantic meaning to raw numbers in a budget dataset. As an example, currency is a type currently applied on a field by mapping a source JTS column onto a new field in a mapping object. I'm wondering: should be a general principle for applying semantic meaning to columns in a CSV or should we consider the FDP a special case.

Related:

pwalsh commented 8 years ago

@danfowler JTS already supports currency as a format on number:

rufuspollock commented 8 years ago

@pwalsh i know - though I'm wondering if that was a good idea vs proper units. Note also we did not support "factor" ;-)

rufuspollock commented 8 years ago

OK, I think we should introduce units and factor. Re units the question I would have is to understand any difference between QUDT and dataprotocols units spec.

@danfowler could you take a quick look at QUDT and the units spec and see if you can identify any differences.

danfowler commented 8 years ago

@rgrp I can take a look.

dr-shorthair commented 8 years ago

I would suggest handling currency separate from units of measure, but in the same overall framework along with controlled vocabularies and coordinate reference systems. These are all 'reference systems'.

The special thing about currency is that conversion factors are time-dependent, and the changes are large. This does not apply to typical uom.

There is also some time-dependency in both spatial and temporal coordinate systems due to (a) moving spatial datum dues to plate tectonics - yes this does matter in applications like precision agriculture; (b) leap seconds, though in both cases most users would not notice.

pwalsh commented 8 years ago

@rgrp @danfowler any progress here?

@dr-shorthair great points. I'm wondering, though, if the conversion aspects you highlight are relevant for the spec itself (rather than relevant for potential applications of the spec).

patcon commented 8 years ago

Great discussion! Just wanted to chime in that I think this would be helpful for CSV columns as well :)

pwalsh commented 7 years ago

@rgrp do you want to move forward on this?

rgieseke commented 7 years ago

Would that look something like the following?

"schema": {
  "fields": [
    {
      "name": "Year",
      "description": "Year",
      "type": "date"
    },
    {
      "name": "Total",
      "description": "Total carbon emissions from fossil fuel consumption and cement production (million metric tons of C)",
      "type": "number",
      "unit": "Mt",
      "unitSystem": "SI"
    }
  ]

[…]

rufuspollock commented 7 years ago

@rgieseke yes - that is correct. Your unitSystem is an addition by you I assume? And is unit a reference to the dataprotocols units spect or a different one?

rufuspollock commented 7 years ago

@pwalsh next steps here would be:

rgieseke commented 7 years ago

@rgrp Yes, sorry I mis-remembered units and unitsSemantic. Why would it be units as plural though?

rufuspollock commented 7 years ago

@rgieseke units was a typo which I have corrected - should be unit.

rnuske commented 7 years ago

We are planning to use table schema for describing the inner structure our resources. But we definitely need to store the unit of measurement. Thus, we would very much welcome if the table schema spec would support it and we wouldn't have to work with custom addons.

danfowler commented 7 years ago

@muehlenpfordt et al at Open Power System Data seem to have produced Data Packages with a unit: attribute at the field level with a string value (e.g. "MW"). I'd be curious to learn if that what use case that supports in that project.

https://github.com/Open-Power-System-Data/renewable_power_plants/blob/master/validation_and_output.ipynb

rgieseke commented 7 years ago

I also went with unit for each table column

https://github.com/openclimatedata/global-carbon-budget/blob/master/datapackage.json#L59

I think the main use case is to easily read in a data set and apply a unit transformation, e.g. for comparison with another dataset.

jgmill commented 7 years ago

For us, the use case is to clarify the unit of measurement, i.e. whether the numbers in the columns should be read as Megawatts MW, Kilowatts kW, Megawatthours MWh, etc. For the case of currencies, we will put EURO or DKK etc.

rufuspollock commented 7 years ago

@muehlenpfordt how do you indicate a currency unit? Do you have a specific prefix or ...?

jgmill commented 7 years ago

I haven't implemented it yet, but I had thought putting "unit": "EUR" (using ISO 4217 currency code) would make it sufficiently clear that the column contains currency data. Now I saw there was the suggestion to have an additional attribute currency. Would that mean I put

"unit": "currency",
"currency": "EUR"

?

That seems a bit redundant. On the other hand it might help considering the amount of different currencies.

What do you think? I am open for you suggestions.

rgieseke commented 7 years ago

How about something like

"unit": "EUR",
"unitSystem": "ISO-4217"
rufuspollock commented 7 years ago

@rgieseke really like that approach.

I think this is mature enough to become a pattern.

I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537

simleo commented 7 years ago

We have a use case for this in biotracks, see https://github.com/CellMigStandOrg/biotracks/issues/9

Kenji-K commented 6 years ago

I have a question. Will there be any specified way of converting measurements from one unit to another? Say celsius to kelvin or fahrenheit. Or is this outside the scope of the spec?

rufuspollock commented 6 years ago

@Kenji-K this would be outside of the spec - it would be something a tool would implement (but the spec could form the basis for that tool's API)

rufuspollock commented 6 years ago

I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537

@pwalsh @roll

yohanboniface commented 1 year ago

Hey, is there still interest in this feature ? We (for French administration) would use it (basically, tools consuming table schema would infer some behavior according to the unit, when defined). Any way we could help it land in the spec ?

peterdesmet commented 1 year ago

Yes, also interested, to use it for Camtrap DP. Although one can of course expand the Frictionless Table Schema as they want (e.g. adding a unit property for each field) I’d also rather have this as part of the core Table Schema itself.

rufuspollock commented 1 year ago

@yohanboniface yes a lot of interest. First start would be a detailed pattern. Note @Stephen-Gates had a go at that in https://github.com/frictionlessdata/specs/pull/607 - we are really open to getting a pattern and then turning that into part of the spec.

DunklesArchipel commented 4 months ago

Working with scientific data, we are very interested in having units implemented in the schema. We started implementing a frictionless schema to describe tabular data containing measured quantities and added a unit key to the fields. In principle, the field would be a QuantitiyField. Other properties of the field would be the dimension. A suggested schema can be found here.

For the string notation of the units, we use that from astropy. This allows simple conversion of units. This allows simple conversion of tabular quantity data into other units. I hope these aspects provide some useful information to improve the specs or even for the validation of scientific data in general.