Open heckerma opened 10 years ago
Is "phenotype type" a free text field?
We need a way to get stats into the current schema draft. Presumably different methods will have different stats so we may need a generic tag-value metadata system
Is "phenotype type" a free text field? It would be useful to have at least some predefined types, but “other” (with free text) will likely always be useful as it will be tough to keep up with new types coming online.
Presumably different methods will have different stats
Yes, for example, we allow for both p-value and bayes factor.
From: Chris Mungall [mailto:notifications@github.com] Sent: Thursday, November 20, 2014 5:10 PM To: cmungall/schemas Cc: David Heckerman Subject: Re: [schemas] draft of db schema for G2P associations (#2)
Is "phenotype type" a free text field?
We need a way to get stats into the current schema draft. Presumably different methods will have different stats so we may need a generic tag-value metadata system
— Reply to this email directly or view it on GitHubhttps://github.com/cmungall/schemas/issues/2#issuecomment-63909577.
After a bit more thought, we don't think "Association stats" should include validations. Instead, validations should be recorded in separate studies. Whether an association is validated can be assessed via query. David, Chris, and Christoph
The 'Variant id(s)' and 'Phenotype id(s)' (plural) raises the question of rather we want the topology to represent a regular graph or a multigraph. Its probably better to have plural concepts on both sides and connect them as a regular graph. There is already precedent on the Variant side (the VariantSet structure). Do we need a similar concept on the Phenotype side? Something like a PhenotypeSet, that lets you composite multiple phenotypes together into a single concept (drug resistance AND proliferative)?
yes, there's lots of interest in PhenotypeSet work, e.g., http://biorxiv.org/content/early/2014/05/22/003905 and http://www.nature.com/nmeth/journal/v11/n4/full/nmeth.2848.html
I've posted notes about our schema discussions in the main GA4GH issue board (https://github.com/ga4gh/schemas/issues/196). We should move our conversations over there, so the larger group can see what we're working on.
Thanks Kyle!
From: Kyle Ellrott [mailto:notifications@github.com] Sent: Tuesday, December 02, 2014 11:00 PM To: cmungall/schemas Cc: David Heckerman Subject: Re: [schemas] draft of db schema for G2P associations (#2)
I've posted notes about our schema discussions in the main GA4GH issue board (ga4gh#196https://github.com/ga4gh/schemas/issues/196). We should move our conversations over there, so the larger group can see what we're working on.
— Reply to this email directly or view it on GitHubhttps://github.com/cmungall/schemas/issues/2#issuecomment-65363670.
PhenotypeSets:
Actually, in practice, phenotypes are mainly considered in disjunction (to increase power).
Here's a rough draft of a db schema for G2P associations (in normal form). Currently, it only handles univariate associations.
Study id (unique global id) Date of entry PUBMEDID or link to paper Organism Cohort(s) Primary or meta analysis? sample size number of tests performed All tests reported? model used (linear regression, logistic regression, linear mixed model, etc.) phenotype transformation used test statistic used list of primary studies (if meta analysis)
Variant id (unique global id) Variant type (SNP, methylation, CNV, SNP set, etc.) Risk Allele(s) Genome locus Genome position Genome position reference Platform used to measure
Phenotype id (unique global id) Phenotype type (disease, drug response, gene expression, etc.) Platform used to measure
Association stats p-value effect size effect size std error Bayes factor
Association Study id Variant id(s) Phenotype id(s) Association stats (one for initial test, then one for each validation)