Closed nheeren closed 5 years ago
In the templates, the cover sheet contains a 1:1 dataset table entry, but the actual data are NOT formatted in the IEDC format. Instead, the data can come in two format types: LIST (what you parsed) and TABLE. While LIST is easy to parse, for TABLE the index structure of row and columns has to be taken into account. This is why there is this funny blue box on the 'cover' sheet. The whole concept is meant to increase usefulness, as I don't want to force data suppliers to use either LIST or TABLE. Instead, they do whatever is easy for them, choose a table index structure that is convenient, and let the parser do the rest. The reading of such parameters is already implemented in ODYM: https://github.com/IndEcol/ODYM/blob/master/odym/modules/ODYM_Functions.py , function ReadParameter()
The parser then has to take apart the different pieces of information (stats_array_string, units), see examples in https://github.com/IndEcol/IE_data_commons/tree/master/IEDC_content_fill/Dataset_Upload
Does that mean IEDC_tools should be able to parse LIST and TABLE type data?
So how is the parser supposed to map e.g. stats_array string
to stats_array_1
?
Yes, both LIST and TABLE. To make it worse, for TABLE data users can specify to report units, uncertainty, and comments on the cover sheet (GLOBAL option, one common value for all data items, applies to most datasets so far), or on a separate sheet (TABLE option cf. \Dropbox\G7 RECC\Data\IE_Data_Commons_Prototype\Data_ODYM_Format\1_F_WIO_Japan_Nakamura_Kondo_2002.xlsx for an example)
The stats_array string is a shortcut to avoid having 4 columns each time. Parsing is done with str.split, see https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_content_fill/Dataset_Upload/3_MC_NACEv2_4000Groups_Upload.py for an example:
# parse stats_array_string
UncString = DataSheet.cell_value(currentrow ,currentcolumn)
UncParts = S.split(';')
U1 = int(UncParts[0])
U2 = np.float(UncParts[1])
U3 = np.float(UncParts[2])
U4 = np.float(UncParts[3])
# U1 ... U4 is then used in SQL statement to fill stats_array_1 to stats_array_4.
resolved
The database's
data
table has the following columnsThe Excel template uses its aspects and then the following columns: value | unit nominator | unit denominator | stats_array string
That means they are spelled differently and will therefore not automatically match.
Probably I still haven't fully understood the concept. There are also Excel files in the template folder that use completely different structure in Values_Master an I wouldn't know how to parse them at all.