CDCgov / seqsender

Automated Pipeline to Generate FTP Files and Manage Submission of Sequence Data to Public Repositories
https://cdcgov.github.io/seqsender/
Apache License 2.0
32 stars 10 forks source link

Pandera metadata validation #36

Closed dthoward96 closed 5 months ago

dthoward96 commented 6 months ago

User metadata can be validated using pandera validation. This will allow for metadata field requirements based on a schema file. This will allow seqsender to automatically detect issues with user metadata. Pandera is a better alternative than hardcoding metadata field validation into seqsender because a schema can be created for each virus with multiple valid options for each field. This can then be easily expanded to include restrictions for other viruses or to roll back restrictions.

Pandera metadata schema files:

dthoward96 commented 5 months ago

Changed schema validation to per database. This added a lot more schemas but allows for more functionality.