Closed nheeren closed 5 years ago
After revising the data model, the regional and process aspect can be 'unspecified'. Will update the templates now and insert 'unspecified' to the relevant classifications if not present yet.
Thanks!
Thanks for fixing!
7_CT_EXIOBASEv3_200Products_To_163Products.xlsx: Exactly, just a rename.
1_F_LiquidMetalFlows_SteelScrapAge_Pauliuk_2013.xlsx, 1_F_MetalDemand_DEETMAN_2018.xlsx: These are table data, which have no headers, just classification items.
My bad about 1_F_LiquidMetalFlows_SteelScrapAge_Pauliuk_2013.xlsx, 1_F_MetalDemand_DEETMAN_2018.xlsx.
3_IUP_Vehicles_9Countries_Dhaniati_2012.xlsx contains different values to encode NULL. We seriously need a definition.
Good point! For this particular example, an empty cell in the template means 'no data available in this dataset', with the emphasis on "This dataset". in iedc.data, the numbers will be stored in a list format and when exported again (as list), only the non zero values would be provided.
The main question here is: should empty cells get a data table entry or not?
To specify whether or not to enter data, I suggest to distinguish the following cases and put a corresponding string into the cell: 1) No information available, string "N.I.A.", leads to NULL entry in database. Further details (number lacking, not readable, not applicable, etc. should be provided in the comment field or sheet and moved to iedc.data). (E.g. the example cases you used in the building data paper). 2) No data, string "N.D.", is ignored by parser, does not lead to entry in the database.
for 3_IUP_Vehicles_9Countries_Dhaniati_2012.xlsx, it should all be "N.D.", hence no data table entry.
PS: 1) The no data, string "N.D." is important for table data, so that datasets with different scope (e.g. spanning different years) still can be put together in one table, unused columns are filled with "N.D".
2) The N.I.A. and N.D. strings are suggestions from my side only, please replace if you have better ideas!
This is a tricky question (which would deserve its own issue). Since we chose DOUBLE
as the data type, we can encode missing, no data, null, na, etc. only as NULL or 0. As we describe in our codebook in the material intensity project, there can be different types of missing data.
For now I will not change any of those values in the data (Excel) files, but have IEDC_tools replace them with NULL values.
Should we create a new issue "Data encoding guidelines" or "missing values"? Maybe others are willing to contribute.
Let's keep it simple! The first question is: Should 'no data' in an Excel template be inserted or not?
Here, see my previous comment. In the case of 3_IUP_Vehicles_9Countries_Dhaniati_2012.xlsx, the empty cells should NOT be inserted as the table format was chosen for reasons of convenience only, in a LIST template the blank cells would not have had a corresponding 'no data' entry. Hence, the empty cells in this template should be filled with a string that we mark for this case, e.g. "N.D." as suggested above.
If inserted, I agree with you that there can be differentiation, and these will lead to NULL for data.value and a comment on why this NULL is there, e.g., based on the scheme in the material intensity codebook.
Closing issue as it has become too broad in this discussion. We might come back to parts of it at a later stage.
@stefanpauliuk could you please look into the following issues (Somehow I feel like we solved them before. Did we overwrite the files??):
More to follow...