ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

Added taxon header validation #62

Closed jacobcook1995 closed 1 year ago

jacobcook1995 commented 1 year ago

This PR adds checks to make sure that headers in a Taxa (GBIF or NCBI) worksheet match with a set of allowed header names, this is done to prevent misspelt names from causing havoc downstream.

In the case of NCBI allowed header names are any taxonomic rank that NCBI recognises. This is currently stored as a long list at the top of taxa.py, but there might be a more elegant way to handle it

Fixes #53

codecov-commenter commented 1 year ago

Codecov Report

Merging #62 (55ad358) into release/3.0.0 (01f97b2) will increase coverage by 0.62%. The diff coverage is 100.00%.

@@                Coverage Diff                @@
##           release/3.0.0      #62      +/-   ##
=================================================
+ Coverage          67.65%   68.28%   +0.62%     
=================================================
  Files                 12       12              
  Lines               3580     3591      +11     
=================================================
+ Hits                2422     2452      +30     
+ Misses              1158     1139      -19     
Impacted Files Coverage Δ
safedata_validator/taxa.py 87.59% <100.00%> (+2.32%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jacobcook1995 commented 1 year ago

Yeah if an elegant solution hasn't immediately jumped out at you I think it can stay for now