cmungall / schemas

Work on data models and APIs for Genomic data.
Apache License 2.0
5 stars 0 forks source link

draft of db schema for G2P associations #2

Open heckerma opened 9 years ago

heckerma commented 9 years ago

Here's a rough draft of a db schema for G2P associations (in normal form). Currently, it only handles univariate associations.

Study id (unique global id) Date of entry PUBMEDID or link to paper Organism Cohort(s) Primary or meta analysis? sample size number of tests performed All tests reported? model used (linear regression, logistic regression, linear mixed model, etc.) phenotype transformation used test statistic used list of primary studies (if meta analysis)

Variant id (unique global id) Variant type (SNP, methylation, CNV, SNP set, etc.) Risk Allele(s) Genome locus Genome position Genome position reference Platform used to measure

Phenotype id (unique global id) Phenotype type (disease, drug response, gene expression, etc.) Platform used to measure

Association stats p-value effect size effect size std error Bayes factor

Association Study id Variant id(s) Phenotype id(s) Association stats (one for initial test, then one for each validation)

cmungall commented 9 years ago

Is "phenotype type" a free text field?

We need a way to get stats into the current schema draft. Presumably different methods will have different stats so we may need a generic tag-value metadata system

heckerma commented 9 years ago

Is "phenotype type" a free text field? It would be useful to have at least some predefined types, but “other” (with free text) will likely always be useful as it will be tough to keep up with new types coming online.

Presumably different methods will have different stats

Yes, for example, we allow for both p-value and bayes factor.

From: Chris Mungall [mailto:notifications@github.com] Sent: Thursday, November 20, 2014 5:10 PM To: cmungall/schemas Cc: David Heckerman Subject: Re: [schemas] draft of db schema for G2P associations (#2)

Is "phenotype type" a free text field?

We need a way to get stats into the current schema draft. Presumably different methods will have different stats so we may need a generic tag-value metadata system

— Reply to this email directly or view it on GitHubhttps://github.com/cmungall/schemas/issues/2#issuecomment-63909577.

heckerma commented 9 years ago

After a bit more thought, we don't think "Association stats" should include validations. Instead, validations should be recorded in separate studies. Whether an association is validated can be assessed via query. David, Chris, and Christoph

kellrott commented 9 years ago

The 'Variant id(s)' and 'Phenotype id(s)' (plural) raises the question of rather we want the topology to represent a regular graph or a multigraph. Its probably better to have plural concepts on both sides and connect them as a regular graph. There is already precedent on the Variant side (the VariantSet structure). Do we need a similar concept on the Phenotype side? Something like a PhenotypeSet, that lets you composite multiple phenotypes together into a single concept (drug resistance AND proliferative)?

heckerma commented 9 years ago

yes, there's lots of interest in PhenotypeSet work, e.g., http://biorxiv.org/content/early/2014/05/22/003905 and http://www.nature.com/nmeth/journal/v11/n4/full/nmeth.2848.html

kellrott commented 9 years ago

I've posted notes about our schema discussions in the main GA4GH issue board (https://github.com/ga4gh/schemas/issues/196). We should move our conversations over there, so the larger group can see what we're working on.

heckerma commented 9 years ago

Thanks Kyle!

From: Kyle Ellrott [mailto:notifications@github.com] Sent: Tuesday, December 02, 2014 11:00 PM To: cmungall/schemas Cc: David Heckerman Subject: Re: [schemas] draft of db schema for G2P associations (#2)

I've posted notes about our schema discussions in the main GA4GH issue board (ga4gh#196https://github.com/ga4gh/schemas/issues/196). We should move our conversations over there, so the larger group can see what we're working on.

— Reply to this email directly or view it on GitHubhttps://github.com/cmungall/schemas/issues/2#issuecomment-65363670.

cmungall commented 9 years ago

PhenotypeSets:

heckerma commented 9 years ago

Actually, in practice, phenotypes are mainly considered in disjunction (to increase power).