Open rufuspollock opened 8 years ago
Pint is a great Python library for units. http://pint.readthedocs.org/
Thinking about this through the lens of a Fiscal Data Package profile, a mapping
object has been used to give semantic meaning to raw numbers in a budget dataset. As an example, currency
is a type currently applied on a field by mapping a source JTS column onto a new field in a mapping
object. I'm wondering: should be a general principle for applying semantic meaning to columns in a CSV or should we consider the FDP a special case.
Related:
@danfowler JTS already supports currency as a format on number:
@pwalsh i know - though I'm wondering if that was a good idea vs proper units. Note also we did not support "factor" ;-)
OK, I think we should introduce units
and factor
. Re units
the question I would have is to understand any difference between QUDT and dataprotocols units spec.
@danfowler could you take a quick look at QUDT and the units spec and see if you can identify any differences.
@rgrp I can take a look.
I would suggest handling currency separate from units of measure, but in the same overall framework along with controlled vocabularies and coordinate reference systems. These are all 'reference systems'.
The special thing about currency is that conversion factors are time-dependent, and the changes are large. This does not apply to typical uom.
There is also some time-dependency in both spatial and temporal coordinate systems due to (a) moving spatial datum dues to plate tectonics - yes this does matter in applications like precision agriculture; (b) leap seconds, though in both cases most users would not notice.
@rgrp @danfowler any progress here?
@dr-shorthair great points. I'm wondering, though, if the conversion aspects you highlight are relevant for the spec itself (rather than relevant for potential applications of the spec).
Great discussion! Just wanted to chime in that I think this would be helpful for CSV columns as well :)
@rgrp do you want to move forward on this?
Would that look something like the following?
"schema": {
"fields": [
{
"name": "Year",
"description": "Year",
"type": "date"
},
{
"name": "Total",
"description": "Total carbon emissions from fossil fuel consumption and cement production (million metric tons of C)",
"type": "number",
"unit": "Mt",
"unitSystem": "SI"
}
]
[…]
@rgieseke yes - that is correct. Your unitSystem
is an addition by you I assume? And is unit
a reference to the dataprotocols units spect or a different one?
@pwalsh next steps here would be:
factor
and unit
@rgrp Yes, sorry I mis-remembered units
and unitsSemantic
. Why would it be units
as plural though?
@rgieseke units
was a typo which I have corrected - should be unit
.
We are planning to use table schema for describing the inner structure our resources. But we definitely need to store the unit of measurement. Thus, we would very much welcome if the table schema spec would support it and we wouldn't have to work with custom addons.
@muehlenpfordt et al at Open Power System Data seem to have produced Data Packages with a unit:
attribute at the field level with a string value (e.g. "MW"). I'd be curious to learn if that what use case that supports in that project.
I also went with unit
for each table column
https://github.com/openclimatedata/global-carbon-budget/blob/master/datapackage.json#L59
I think the main use case is to easily read in a data set and apply a unit transformation, e.g. for comparison with another dataset.
For us, the use case is to clarify the unit of measurement, i.e. whether the numbers in the columns should be read as Megawatts MW
, Kilowatts kW
, Megawatthours MWh
, etc. For the case of currencies, we will put EURO
or DKK
etc.
@muehlenpfordt how do you indicate a currency unit? Do you have a specific prefix or ...?
I haven't implemented it yet, but I had thought putting "unit": "EUR"
(using ISO 4217 currency code) would make it sufficiently clear that the column contains currency data. Now I saw there was the suggestion to have an additional attribute currency
. Would that mean I put
"unit": "currency",
"currency": "EUR"
?
That seems a bit redundant. On the other hand it might help considering the amount of different currencies.
What do you think? I am open for you suggestions.
How about something like
"unit": "EUR",
"unitSystem": "ISO-4217"
@rgieseke really like that approach.
I think this is mature enough to become a pattern.
I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537
We have a use case for this in biotracks, see https://github.com/CellMigStandOrg/biotracks/issues/9
I have a question. Will there be any specified way of converting measurements from one unit to another? Say celsius to kelvin or fahrenheit. Or is this outside the scope of the spec?
@Kenji-K this would be outside of the spec - it would be something a tool would implement (but the spec could form the basis for that tool's API)
I also think we may want to move the units draft spec http://specs.okfnlabs.org/units/ back to FD specs /cc @danfowler @pwalsh - now #537
@pwalsh @roll
Hey, is there still interest in this feature ? We (for French administration) would use it (basically, tools consuming table schema would infer some behavior according to the unit, when defined). Any way we could help it land in the spec ?
Yes, also interested, to use it for Camtrap DP. Although one can of course expand the Frictionless Table Schema as they want (e.g. adding a unit
property for each field) I’d also rather have this as part of the core Table Schema itself.
@yohanboniface yes a lot of interest. First start would be a detailed pattern. Note @Stephen-Gates had a go at that in https://github.com/frictionlessdata/specs/pull/607 - we are really open to getting a pattern and then turning that into part of the spec.
Working with scientific data, we are very interested in having units implemented in the schema.
We started implementing a frictionless schema to describe tabular data containing measured quantities and added a unit key to the fields. In principle, the field would be a QuantitiyField
. Other properties of the field would be the dimension
. A suggested schema can be found here.
For the string notation of the units, we use that from astropy. This allows simple conversion of units. This allows simple conversion of tabular quantity data into other units. I hope these aspects provide some useful information to improve the specs or even for the validation of scientific data in general.
STATUS:
Excellent discussion with @dr-shorthair today led me to consider importance of units and scales (and currency) in JSON Table Schema.
Suggest we could specific at
MAY
level:References