Reorganising external accession IDs from INSDC database

malloryfreeberg commented 7 years ago

As per the "EBI v3 metadata feedback" Google doc:

The current accession field names seem to imply a larger set of name spaces than 
actually exist. All the INSDC databases share a name space (ENA, GenBank/SRA 
and DDBJ) and their accession should be tracked in the same fields.

The fields which are affected by this are

Project.ddjb_trace
Project.ncbi_bioproject
Project.sra_project

Assay.ena_experiment
Assay.ena_run
Assay.sra_experiment
Assay.sra_run

Sample.ena_sample
Sample.ncbi_biosample

Suggested alternatives

Project.insdc_project ^[D|E|S]RP]\d+
Project.insdc_study ^PRJ[E|N|D]\w\d+

Assay.insdc_experiment ^[D|E|S]RX]\d+
Assay.insdc_run ^[D|E|S]RR]\d+

Sample.insdc_sample ^[D|E|S]RS]\d+

The NCBI and EBI biosample databases are also peer archives which share a name space.

Biosd_id should be able to capture both SAMN and SAME identifiers for the samples 
from either the NCBI and EBI biosamples databases.

This should be implemented in v4 and supersedes issue #47.

malloryfreeberg commented 7 years ago

More info here on INSDC.

malloryfreeberg commented 7 years ago

Addressed by #50

HumanCellAtlas / metadata-schema

Reorganising external accession IDs from INSDC database #48