datacommonsorg / data

Apache License 2.0
60 stars 109 forks source link

[un energy] scaling issues with energy generation data #524

Open beets opened 3 years ago

beets commented 3 years ago

It looks more obvious in this chart:

image

It looks like the scaling for other countries are not scaled to GWh.

https://autopush.datacommons.org/tools/timeline#&place=country/USA,country/IND,country/FRA,country/RUS,country/CHN,country/BRA&statsVar=Annual_Generation_Electricity

shifucun commented 3 years ago

It would be nice to have a "preferred" unit for energy related import and check it in the import tool? @pradh

pradh commented 3 years ago

I agree it is nicer to have the same unit for a given StatVar.

But we probably shouldn't plot different units in a single bar/line chart ? Also, in this case, are we able to prefer UNEnergy as the source, given US has both: https://screenshot.googleplex.com/Aj6PeKqb4ovWHv3 ?

beets commented 3 years ago

Ah, I missed that there are two datasources plotted together! Sorry @ajaits.

We should be able to do this conversion easily if we understand these units. But we can prefer UNEnergy for now

pradh commented 3 years ago

Bo, to generalize your suggestion to have some check in import tool, instead of picking a specific unit, we can perhaps lookup the "StatVarSummary" cache for a stat-var and Warn if a new unit (not part of the existing series) gets added for a pre-existing stat-var.

WDYT?

shifucun commented 3 years ago

From a API client point view (in this case website is a client), it would be nice that we can convert data to a unified common unit if possible.

If there are multiple source series with different unit, then mixer would ideally pick the most used unit, then some places are dropped. Or, website would have a the logic to do number conversion. In either case, I think conversion in the source is better, as users have to do the conversion any way.

shifucun commented 3 years ago

Bo, to generalize your suggestion to have some check in import tool, instead of picking a specific unit, we can perhaps lookup the "StatVarSummary" cache for a stat-var and Warn if a new unit (not part of the existing series) gets added for a pre-existing stat-var.

WDYT?

That works. But still feel a unified unit is better.. It should happen somewhere before the data is given back to users. Common data (with common unit) :)

pradh commented 3 years ago

Bo, to generalize your suggestion to have some check in import tool, instead of picking a specific unit, we can perhaps lookup the "StatVarSummary" cache for a stat-var and Warn if a new unit (not part of the existing series) gets added for a pre-existing stat-var. WDYT?

That works. But still feel a unified unit is better.. It should happen somewhere before the data is given back to users. Common data (with common unit) :)

Unless we support conversion capability in the schema/API, we can only notify the importing user to unify it. And that's what this does?

pradh commented 3 years ago

If there are multiple source series with different unit, then mixer would ideally pick the most used unit, then some places are dropped. Or, website would have a the logic to do number conversion. In either case, I think conversion in the source is better, as users have to do the conversion any way.

Conversion is source is better, but we probably have this and other diverging cases in the KG now. For these, might be nice to include conversion support in API layer. For now perhaps the logic can be hard-coded/config-driven, but eventually it can perhaps read the schema to do conversions (b/197723426 talks about schema for this).

shifucun commented 2 years ago

I will do this in API for now https://github.com/datacommonsorg/mixer/issues/631. When conversion schema is available, we can move the conversion during import time.