Closed davidorme closed 1 year ago
Merging #74 (7a78564) into release/3.0.0 (5702281) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## release/3.0.0 #74 +/- ##
==============================================
Coverage 69.00% 69.00%
==============================================
Files 12 12
Lines 3639 3639
==============================================
Hits 2511 2511
Misses 1128 1128
Impacted Files | Coverage Δ | |
---|---|---|
safedata_validator/taxa.py | 87.65% <ø> (ø) |
|
safedata_validator/field.py | 93.48% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
The data for the GBIF taxon backbone from 2016 differs from more recent simple backbone dumps in using empty strings rather than the
postgres
default\N
for null values. For taxon hierarchy keys (genus_key
etc) this is a problem because the import function only converts\N
to SQLitenull
values. When a given taxon is processed, this results in taxon keys for inapplicable more nested taxonomic levels coming in as an empty string rather than None. The higher taxon validation then includes a bunch of entries like["species", ""]
for those inapplicable levels, rather than filtering them out and the id lookup raises an Exception.We could fix this by creating a special case within the GBIF database building code. It isn't ok to simply substitute all empty strings with None when building GBIF backbone databases, because that should only be applied to the taxon key fields (and not actual string fields). So, we would have to detect the 2016 dataset being built and apply updates to a named set of fields.
I've solved this more simply here by simply adding empty strings as a condition marking an inapplicable taxon key as well as None. The PR also includes the addition of a
devtool
directory with an example script for debugging an entry point function, with a mechanism for passing command line arguments in toargparse
.