Closed Nelly-Barret closed 1 week ago
A good test would be to list all non-int/float/boolean values to see "what remains as strings"
To cast string to int/float values, we cannot simply do:
try:
return float(my_value)
except Exception:
return my_value
because it will not process correctly numbers which are not written using the 🇬🇧 convention, i.e., with a .
to separate decimals and a ,
to separate thoushands.
Instead, we need to use a locale, set to the origin country of the data, e.g., 🇮🇹 for Buzzi, 🇪🇸 for lafe, etc...
I have added the local positioning within the ETL script. I also defined the locale of each medical center; this may be overriden to use the 🇬🇧 convention with the parameter --use_en_locale=True
Seen today while investigating #41:
Values with commas are quoted to be correctly read from the CSV. However, they should be converted to int/float values when inserted in the database.