IndEcol / IE_data_commons

Code and documentation for a commons of structured industrial ecology data
MIT License
22 stars 2 forks source link

Custom classifications have all same value in datasets table #19

Closed nheeren closed 5 years ago

nheeren commented 5 years ago

Quite likely I am just a bit confused right now. However, the table datasets already exist when IEDC_tools come in. For the entry with dataset_name = 1_F_WIO_Japan_Nakamura_Kondo_2002, and dataset_id = 60, aspect_1 = 11 (i.e. commodity) and aspect_1_classification = 15 (i.e. custom). The latter seems to be a dummy classification you @stefanpauliuk created once.

Now this is technically wrong or not? IEDC_tools create a new classification id = 63 and classification_name = commodity__1_F_WIO_Japan_Nakamura_Kondo_2002. So, iedc_review.datasets.aspect_1_classification should probably be 63 and not 15?

So should IEDC_tools modify the datasets entry after creating a new classification or is the intended behaviour, that IEDC_tools create the entry?

In case I am erring here, where would the correct link between 1_F_WIO_Japan_Nakamura_Kondo_2002 and commodity__1_F_WIO_Japan_Nakamura_Kondo_2002 be visible?

Note: I am talking about the iedc_review database, not iedc.

stefanpauliuk commented 5 years ago

The reason for this behavior is somewhat historic: To find out whether the database structure works, I first created the different entries in the datasets table, then identified the common and custom classifications, and then the data templates. That means that everywhere we have 'custom' as classification for an aspect, we need to create a new classification for the system dimension of that aspect and then replace the 15 by the id of the newly created classification (here: 63) in the datasets table. Same for the man iedc database.

Yes, IEDC_tools needs to modify the datasets table: update to correct classification ids for datasets that are already in there, and create completely new datasets table entry for datasets that are defined by template entirely.

The correct link between 1_F_WIO_Japan_Nakamura_Kondo_2002 and commodity__1_F_WIO_Japan_Nakamura_Kondo_2002 is coded in the datasets table after IEDC_tools has created the custom classification, that holds for both the review and the live database.

nheeren commented 5 years ago

Ok. Makes sense. Thanks for clarifying.

Yes, IEDC_tools needs to modify the datasets table: update to correct classification ids for datasets that are already in there,

OK. Will implement.

...and create completely new datasets table entry for datasets that are defined by template entirely.

So you mean whenever it does not exist, create it? Something like:

if dataset not in datasets_table:
    create_new_dataset_entry(dataset)

Or is there another rule deciding when it should be created?

stefanpauliuk commented 5 years ago
if dataset not in datasets_table:
    create_new_dataset_entry(dataset)

Exactly!

stefanpauliuk commented 5 years ago

if dataset not in datasets_table: create_new_dataset_entry(dataset)