Closed nheeren closed 5 years ago
Lowercase is the default but for abbreviations and proper names we need upper cases (USA, PVC, etc.).
The substance list in 1_UPI_USLCI_Aluminum_cold_rolling_at_plant.xlsx was extracted from the US LCI database, and here as well as in other datasets, it is the responsibility of the data providers to make sure that the classification and classification items used and provided are correct (i.e., all of the same dimension and no redundancies or spelling mistakes).
In the case of 1_UPI_USLCI_Aluminum_cold_rolling_at_plant.xlsx, I was not careful enough when extracting the list. Hence, you just reject those data templates and I will need to fix the classifications in them ;)
(Also referring to your email).
The problem is that we will keep running in annoying issues. Pandas, Python, and SQL use functions that are sometimes case sensitive and sometimes not.
For instance, it took me almost an hour to figure out why my check for existing classification_items would pass, but the database would reject the upload since it did not find the corresponding foreign key. To make things worse, normal SQL SELECT
statements are not case-sensitive.
Therefore, I was thinking it might be simplest to force everything lowercase. Then there is a clear policy. I don't think there can ever be cases where the uppercase really matters. Would be good to have a feedback from someone with a lot of database experience. @konstantinstadler maybe?
Not much I can add here. From a user perspective I agree with @stefanpauliuk that if sometimes feels strange to add everything lowercase (e.g. ISO3 country abbreviations). Are you planning that users (data uploaders) directly interact with the database in SQL or do they always have to go through some script. In the latter I would recommend a "translating" function which converts everything to lowercase. This also gives you the possibility to further extend that functionality to other problems: U$ -> USD, convert countrynames to ISO3 or whatever, ... The translation could also be a standalone testing script which user are recommended to run and it logs all changes to the users - (s)he can than decide if the changes are reasonable
We kept case-sensitivity intact. This will require attention by data providers... May need to reopen later...
The input tables contain often redundant entries like "carbon-17" and "Carbon-17".
How do you want to deal with that? mySQL seems to be case sensitive, but you have to kind of force it to: https://stackoverflow.com/a/5629121/2075003
I think I would suggest to always force all comparisons to lowercase. However, that would require basically a redo of the database. What do you think?
For now I will manually fix all data input tables... -- see notes in #10 ðŸ˜