Closed stefanpauliuk closed 3 years ago
But fix report archived for Aug. 3, 2021 in https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_content_fill/IEDC_Prototype_Datasets_Correct.py
Short summary here: All global steel cycle (Pauliuk 2013) data are affected, and only these data. Scan for all aspects (aspect 3 shown here):
cur.execute("SELECT DISTINCT dataset_id FROM data WHERE aspect3 = 5987") # search for possibly wrong entry for Kosovo 3 digit code
for row in cur:
print(row)
Global steel cycle (Pauliuk 2013) data, only has Trinidad and no Kosovo data: change classfication id 5987 to 6096 (Kosovo->Trinidad) with:
for m in [59,66,67,68,69,]: # all are global steel (Pauliuk 2013) datasets, has T&T but no Kosovo, must be wrong and is therefore changed:
cur.execute("UPDATE data SET aspect5 = 6096 WHERE aspect5 = 5987 AND dataset_id = %s",(m))
This duplication was previously overlooked and led to a number of dataset mismatches, which need to be manually checked and corrected. See reports below. The error is now fixed: Kosovo does not yet have an official numerical ISO code, so it was assigned 10018, which is a running number as custom code for our database, see https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_Classification_fill/regions_iso_iedc_data.csv