CINERGI / UseCases

Build Use Cases in the WIKI
Apache License 2.0
1 stars 0 forks source link

Use Case: ECOGEO: Identify (meta)genomic datasets by sampling environmental conditions #11

Open hsu000001 opened 9 years ago

hsu000001 commented 9 years ago

Identify all (meta)genomic datasets that meet certain environmental measurement thresholds (e.g. all samples at X°C, with Y nitrate concentration, Z phosphate concentration)

smrgeoinfo commented 9 years ago

first figure out which datasets have metagenomic samples that have associated Nitrate concentration, phosphate concentration

For processing, use the GSC standards compliant sites IMG (JGI Integrated Microbial Genomes) < http://img.jgi.doe.gov/> uses GSC standard CAMERA DDC (Data Distribution Center) < http://camera.crbs.ucsd.edu/ddc/> uses GSC standard MG-RAST http://metagenomics.anl.gov uses GSC standards

need to use ENVO ontology for concept mapping the Genomics Standards Consortium (GSC) metadata elements. Assume simple case to start: the metagenomic analysis metadata includes Nitrate and Phosphate values of some sort. Later have to deal with joining Sample description, geochemistry and metagenomic datasets that may be stand-alone resources, where the sample ID is the basis for joining. GSC metadata has phosphate and many nitrogen-related fields, concept mapping.

do conceptual mapping from GSC metadata, map CAMERA metadata files. Load into Staging db. Search index for SOLR interface has to index the entity and attributes part of the metadata. so can find data sets that have instance-level information about 'nitrate' and phosphate' and return those datasets.

World Ocean atlas provides access to various data sets that might be used to get proxies (not necessarily for phosphate, nitrate)