CDCgov / seqsender

Automated Pipeline to Generate FTP Files and Manage Submission of Sequence Data to Public Repositories
https://cdcgov.github.io/seqsender/
Apache License 2.0
32 stars 10 forks source link

User defined date specificity #46

Closed mikeyweigand closed 1 month ago

mikeyweigand commented 6 months ago

Is your feature request related to a problem? Please describe. Hard-coded date formatting at YYYY-MM-DD creates challenges for generalizing to other microbial pathogens, the majority of which must be submitted to BioSample with only YYYY or YYYY-MM to ensure privacy.

Describe the solution you'd like Consider letting users define their own date specificity, perhaps in the *_config.yaml. That would preserve the current default requirements for SC2 and Flu. A more advanced option would be to allow setting a minimum (or maximum) specificity rather than a fixed requirement for flexibility during submission (e.g. [1] YYYY or YYYY-MM, but not YYYY-MM-DD vs [2] YYYY-MM or YYYY-MM-DD, but not YYYY).

Describe alternatives you've considered Maybe this also gets covered in your solution to #43 but BioSample itself does not impose strict requirements for date specificity and it's generally up to submitters to determine what is appropriate.

Additional context Add any other context or screenshots about the feature request here.

dthoward96 commented 6 months ago

Hey,

Yes, I've noticed this and am working on resolving it. You are correct in the issue you tagged will resolve this problem as the pandera schema's will be used to validate the other database fields as well. I've moved that feature to be added in the next version update to resolve other issues but it will also resolve this one as well. I'm expecting to have it live on the version update branch later this week. After the changes are live and testing has been done that version should be going live to master shortly then after.

dthoward96 commented 1 month ago

V1.2.0 now supports date formats: "YYYY", "YYYY-MM", and "YYYY-MM-DD"