ImperialCollegeLondon / safedata_validator

Python tools to validate and publish datasets using the safedata metadata format.
https://safedata-validator.readthedocs.io/
MIT License
2 stars 4 forks source link

Remove remote taxon validation completely #31

Closed davidorme closed 2 years ago

davidorme commented 2 years ago

The original version used remote taxon validation to give users a way to:

  1. Avoid the file space needed for the largish local taxon database (1GB+)
  2. Avoid having to build that database

However, the codebase now:

  1. Supports database building with well scripted endpoints.
  2. Has moved to clear timestamping of taxon DB versions (not so much of an issue for GBIF, which has stepped updates, but a big issue for NCBI which has rolling updates).

The file sizes are reasonably big, but not overwhelming for a local user (currently ~ 2GB for both NCBI and GBIF). There are also serious downsides to maintaining the remote validators:

  1. Continual tweaking of the test database to ensure that the local and remote test data are in sync.
  2. Significant runtime overhead for local testing and CI.
  3. Code complexity in both the testing and main codebase.

Removing remote significantly improves maintenance for little cost and only provides support for extremely niche use cases.