IndEcol / IE_data_commons

Code and documentation for a commons of structured industrial ecology data
MIT License
22 stars 2 forks source link

Changes in input files (list type) #10

Closed nheeren closed 5 years ago

nheeren commented 6 years ago

Just for documentation purposes (in case another set of files need to be corrected too). Close whenever you see fit.

Work in progress -- still updating the issue with more changes. Will remove this message once done

I made the following changes to the list-type input files in order to make them work with the parser & uploader scripts. Ideally they were errors 😆:

The points in bold are breaking issues!

stefanpauliuk commented 6 years ago

engineering_material was created by Graedel/Fishman to specify that we are talking about 'typical' engineering materials, like steel, al, plastics, opposed to the general notion of 'materials' as 'goods or substances'. Since both 'engineering_material' and 'material' aspects link to the same dimension (5: material), this change does not matter much. But 'engineering_material' should be kept if the materials clearly are engineering materials.

"Data" to "Values_Master": I planned to use cell H11 to indicate the sheet where the data are. That is why data should be in on the sheet with the name as in H11. But I was not consistent with that rule, many datasets still are on "Values_Master", which is an ODYM template legacy.

nheeren commented 5 years ago

But 'engineering_material' should be kept if the materials clearly are engineering materials.

In that case it should be either indicated as custom or added to the classifications I suppose.

nheeren commented 5 years ago

Added a couple of new changes. Some breaking issues in bold.

stefanpauliuk commented 5 years ago

3_MC_Buildings_Gustavsson_2006

4_PY_WorldSteel_EoL_RR_SteelScrap

'3_LT_AluCycle_LIU_2012.xlsx', '3_LT_AluCycle_LIU_2013.xlsx', '3_LT_IAI_GARC_2011.xlsx', '3_LT_MetalDemand_DEETMAN_2018.xlsx', '3_LT_SteelCycle_PAULIUK_2013.xlsx', '3_MC_SteelDemand_HU_2010.xlsx'

3_MC_Buildings_Kleemann_2014.xlsx,

6_PCS_Buildings_Indonesia_1985.xlsx, 6_PCS_Buildings_Indonesia_BPS_2008_2015.xlsx, '6_PCS_Buildings_USA_MOURA_2015.xlsx'

Issue that remain "apect_4 is not well defined as custom" -> This I did not understand!

3_MC_MetalDemand_DEETMAN_2018.xlsx, 5_CAP_PowerGenCapacity_Germany_2018.xlsx,

'3_MC_Vehicles_Hawkins_2012.xlsx',

nheeren commented 5 years ago

Thanks!

changed time classification from 9 to 3, please update dataset table entry!

You mean manually change aspect2_classification from 9 to 3? I assume that is just a quick fix for now? It is curious my script worked. Probably there should be another safety feature implemented...

needs discussion: in the main database, I dropped the NOT NULL constraint for data.value to allow us to distinguish between '0' and 'no data'. Does that make sense?

Good question. I think it makes sense for now. In our material intensity data project i was extremely explicit about missing values: https://github.com/nheeren/material_intensity_db/blob/master/codebook.md#special-values (Stefan has access - for everybody else: will be made public after the paper review). In short: I differentiate between "no observation" and "no information". I guess I would suggest to think about such a codebook here as well and define exactly what a NULL implies.

Need to add missing classification items to db. will do so during weekend.

Thanks. Please notify me. This is the last file I cannot process right now.

stefanpauliuk commented 5 years ago

You mean manually change aspect2_classification from 9 to 3? I assume that is just a quick fix for now?

Actually, 3 was the right classification, that is why it worked. Unlikely that two classifications will contain the same items, don't think that checks are necessary at this point.

OK, let's allow for NULL value, and specify in the comment field whether it's no observation or no information.

nheeren commented 5 years ago

it seems like we solved all list-data related file issues