cidgoh / DataHarmonizer

A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
MIT License
94 stars 26 forks source link

Accession Prefix Validation #60

Closed cmrn-rhi closed 10 months ago

cmrn-rhi commented 4 years ago

Implement validation on accessions that have controlled prefixes.

BioProject Accession - Prefix: PRJNA BioSample Accession - Prefix: SAMN SRA (run) Accession - Prefix: SRR GISAID Accession - Prefix: EPIISL

GenBank Accessions - Allowable prefixes for nucleotide direct submissions: U, AF, AY, DQ, EF, EU, FJ, GQ, GU, HM, HQ, JF, JN, JQ, JX, KC, KF, KJ, KM, KP, KR, KT, KU, KX, KY, MF, MG, MH, MK, MN, MT (source: https://www.ncbi.nlm.nih.gov/Sequin/acc.html)

ivansg44 commented 4 years ago

Seems like a good idea to me. @griffie Any problems with us adding this?

griffie commented 4 years ago

I think that would be good.

ddooley commented 10 months ago

This can be done via a generic regular expression match, i.e. ^(U|AF|AY|DQ|EF|EU etc.)\d+