IndEcol / IE_data_commons

Code and documentation for a commons of structured industrial ecology data
MIT License
23 stars 2 forks source link

Duplicate region 3-digit code 780 for Kosovo (wrong!) and Trinidad and Tobago (correct!) #25

Closed stefanpauliuk closed 3 years ago

stefanpauliuk commented 3 years ago

This duplication was previously overlooked and led to a number of dataset mismatches, which need to be manually checked and corrected. See reports below. The error is now fixed: Kosovo does not yet have an official numerical ISO code, so it was assigned 10018, which is a running number as custom code for our database, see https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_Classification_fill/regions_iso_iedc_data.csv

stefanpauliuk commented 3 years ago

But fix report archived for Aug. 3, 2021 in https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_content_fill/IEDC_Prototype_Datasets_Correct.py

Short summary here: All global steel cycle (Pauliuk 2013) data are affected, and only these data. Scan for all aspects (aspect 3 shown here):

cur.execute("SELECT DISTINCT dataset_id FROM data WHERE aspect3 = 5987") # search for possibly wrong entry for Kosovo 3 digit code
for row in cur:
     print(row)

Global steel cycle (Pauliuk 2013) data, only has Trinidad and no Kosovo data: change classfication id 5987 to 6096 (Kosovo->Trinidad) with:

for m in [59,66,67,68,69,]: # all are global steel (Pauliuk 2013) datasets, has T&T but no Kosovo, must be wrong and is therefore changed:
     cur.execute("UPDATE data SET aspect5 = 6096 WHERE aspect5 = 5987 AND dataset_id = %s",(m))