Closed jaeltan closed 1 year ago
Maybe we should create a function that standardises titles, and updates treatyIDs and manyIDs at the database level, to avoid we run into all the issues we have so far. That would also make easier to keep all datasets in database (usually added at different times) consistent.
@jaeltan what do you think?
We could also implement this at the export data level? So anytime a new dataset in added to database, we verify if IDs and titles are up to date and, if not, we update them in each dataset.
I think implementing it at the export_data step is a good idea. So if we add this step into export_data
we can leave the treaty titles as is in the dataset and only improve how they are managed for creating the IDs in the final step?
Add matches to EC and EFTA in countryregex and improve
standardise_titles()
so that states' names in treaty titles are standardised and matched consistently to reduce errors in manyID generation