BaderLab / netDx

R package with netDx software and data for examples
Other
12 stars 9 forks source link

Incorporating genetic data into netDx #23

Closed shraddhapai closed 5 years ago

shraddhapai commented 7 years ago

Question: How do we incorporate genetic variants into netDx to improve predictive power and capture pathways of relevance to the disorder?

Considerations:

  1. How do you decide which variants are used to create features? a) prune by pvalue from GWAS b) filter by presence in gene domain including protein c) keep nonsynonymous SNPs. How many SNPs do you need for this strategy to be feasible?
  2. How do you quantify genetic similarity? a) Correlation of SNP vectors in a pathway? Shirley's approach with breast cancer dataset.
    • Shirley: I have tried the above approach but it doesn't result in very good performance, however we can use the results I obtained for baseline or comparison to other hopefully better methods. b) Jennifer Listgarten's method of quantifying genetic similarity (uses PC from ethnicity as covariates)? c) Binary network for co-occurrence of mutations?
  3. Can we have gene-level features? (probably not without GM upgrade)
  4. How do you integrate feature-selected networks into the PSN?

Plan

Code

The "..." indicate predictor design variables that should be constant for all methods tried.

rngNum=25 # num train/test splits. Scale to 100 once done
pctTrain=0.8 # 80% train, 20% test

Test datasets

Criteria:

Possible candidates:

. WTCCC Crohn's disease data (~3000 samples - 1:2 class ratio). Have GWAS signal + GSEA results available.

. RA SNP data from Siminovitch Lab (small data set ~150-200 subjects and 200K SNPs?)

. Rheumatoid arthritis dataset from DREAM?

. Could try Mittal lab Tetralogy of Fallot dataset or Anthracycline toxicity prediction dataset

. Shirley's breast cancer genetics dataset

Other (genetic signal not previously quantified by us):

. TCGA PanCancer ( has somatic mutations, has pathway analysis.

. PNC dataset


  1. Apply one of the filters above or design features in various ways as suggested above.
  2. Compare the predictor checklist output for all the settings.
shraddhapai commented 7 years ago

Somewhat cleaner and better structured.

shraddhapai commented 7 years ago

code spec for testing genetic data inc.

shraddhapai commented 5 years ago

Separate project that has exceeded the design described here.