biocodellc / geome-ui

MIT License
3 stars 4 forks source link

SRA required fields #384

Closed jdeck88 closed 4 years ago

jdeck88 commented 4 years ago

SRA asks for a set of fields that are somewhat unique... however, the user may not be aware of what these are:

From Kim Andrews: I'm wondering if it might be helpful if the list of attributes on the Geome website includes some information for each attribute as to whether it will be included in the BioSample-attributes.tsv file, so that researchers can make sure they include the attributes they want to go into SRA. Otherwise people might not end up taking advantage of Geome's functionality and just load everything into the SRA manually... or else they might just never notice that the BioSample-attributes.tsv file doesn't contain all the info they want in the SRA.

So, to solve this, we should have a feature that identifies the fields for SRA submission.... E.g. during project creation, we could have an SRA attribute set that identifies the fields that will be going into SRA, which are:

sample_name sample_title
organism
collection_date geo_loc_name
tissue
biomaterial_provider
collected_by
depth
dev_stage
identified_by
lat_lon sex breed
host
age bcid (supplied only by geome)

The other option here is to just supply all the field names generated by GEOME regardless of whether they are in the SRA suggested set.

jdeck88 commented 4 years ago

I think the 2nd option will be easier to implement....

From the BioSample Metadata FAQ at https://www.ncbi.nlm.nih.gov/biosample/docs/submission/faq/

....In addition to recognized package attributes, you can provide any number of custom attributes to fully describe your samples. Provide comprehensive information that will allow users to fully interpret your study.

Here is what i recommend including in the file

sample_name organism isolate (insert BCID link pointing to materialSampleID) dev_stage (map to lifestage) sex tissue (tissueType) geo_loc_name lat_lon specimen_voucher (institutioncode:collectioncode:catalognumber) (if provided) bcid All other fields just use GEOME column names as keys

ewingrj commented 4 years ago

added all data in biocodellc/geome-db@1a1d7f83360ee5a03a8193d223d33c44aebae3fe