Big-Life-Lab / PHES-ODM

The Public Health Environmental Surveillance Open Data Model (PHES-ODM, or ODM). A data model, dictionary and support tools for environmental surveillance.
Creative Commons Attribution Share Alike 4.0 International
54 stars 18 forks source link

ISO Code mistakes in the zones table of version 2.2; Repeating partIDs #267

Closed mathew-thomson closed 4 months ago

mathew-thomson commented 4 months ago

This issue was raised by @sorinsion via email.

The issue is two-fold:

1) The is a mistake in the zones GitHub CSVs. The isoCode field should be 2 characters wide and contain only the country code, no zones codes. We have instances where zones are nested (to some extent) and so a zones code is then used in the isoCode column.

For example, for Belgium we have:

isoCode isoZone zoneName
BE BE-BRU Brussels Hoofdstedelijk Gewest
BE BE-VLG Vlaams Gewest
BE BE-WAL Waals Gewest
BE-VLG BE-VAN Antwerpen
BE-VLG BE-VBR Vlaams-Brabant
BE-VLG BE-VLI Limburg
BE-VLG BE-VOV Oost-Vlaanderen
BE-VLG BE-VWV West-Vlaanderen
BE-WAL BE-WBR Brabant wallon
BE-WAL BE-WHT Hainaut
BE-WAL BE-WLG Liège
BE-WAL BE-WLX Luxembourg
BE-WAL BE-WNA Namur

BE-VLG and BE-WAL are larger areas that are subdivided in several zones. Brussels is its own zone. Because of this, however, the countries and zones tables cannot be properly linked. The error is:

“Column 'countries.isoCode' is not the same length or scale as referencing column 'zones.isoCode' in foreign key 'FK_zones.isoCode'. Columns participating in a foreign key relationship must be defined with the same length and scale.”.

The error is justified, since there are no BE-VLG or BE-WAL records in the countries table, isoCode column.

The solution will be just to truncate the isoCode column entries that cite zones in the zones CSV to just the 2 character country code.

2) Another issue is that there are duplicate partIDs in the parts table. This isn’t allowed by databases as partID is a primary key.

The solution here is to rename one of the duplicate entries in all cases, while not accidentally creating new duplicates.

The current culprits are:

partID partLabel
conc Sample concentrate
conc Concentration measure
frna F-Specific RNA bacteriophages
fRNA F+ RNA coliphage
n211i n211i omicron-variant gene target
n211i Omicron Variant n211i mutation
n679k n679k omicron-variant gene target
n679k Omicron Variant n679k mutation
n856k n856k omicron-variant gene target
n856k Omicron Variant n856k mutation
n969k n969k omicron-variant gene target
n969k Omicron Variant n969k mutation
p100l p100l delta-variant gene target
p100l Delta Variant p100l mutation
p2046l p2046l delta-variant gene target
p2046l Delta Variant p2046l mutation
p2287s p2287s delta-variant gene target
p2287s Delta Variant p2287s mutation
p3395h p3395h omicron-variant gene target
p3395h Omicron Variant p3395h mutation
p681r p681r delta-variant gene target
p681r Delta Variant p681r
phone Contact phone
phone Country national phone prefix

I'll turn this issue into a PR and try to merge this patch fix by EOD, and launch a v2.2.1 release.

mathew-thomson commented 4 months ago

The issue has now been corrected and resolved, and version 2.2.1 is now live.