Google-Health / genomics-research

125 stars 33 forks source link

Using deepnull with multiple phenotypes #3

Closed ASLeonard closed 2 years ago

ASLeonard commented 2 years ago

Hi, Some GWAS tools like plink2 recommend running with multiple phenotypes at once rather than multiple runs of single phenotypes.

If you have multiple quantitative phenotypes with either no missing values, or missing values for the same samples, analyze them all in a single --glm run!

In the deepnull paper as well, there are multiple listed phenotypes tested (ALP, ALT, AST, etc.). It looks like the target flag only accepts a single value, so was this achieved by running multiple instances of deepnull on a per-phenotype basis? If so, is it possible to the concatenate the "phenoN_deepnull" covariates so e.g. plink2 can run with all at once, or would that introduce some bias?

The idea for say phenotypes pheno1, pheno2, pheno3 would be something like

for N in {1..3}
do
python -m deepnull.main \
  --input_tsv=/input/YOUR_PHENO${N}COVAR_TSV \
  --output_tsv=/output/YOUR_OUTPUT${N}_TSV \
  --target=pheno${N} \
  --covariates="age,sex,genotyping_array"
done
paste /input/ALL_COVAR_TSV $(cut -f last_column /output/YOUR_OUTPUT${N}_TSV )...) > COVARS_PLUS_DEEPNULL
plink2 ... --covar COVARS_PLUS_DEEPNULL

Best, Alex

fhormoz commented 2 years ago

Hi Alex,

It is true that in our DeepNull paper, we applied DeepNull to each phenotype separately and then run GWAS. In addition, it is true that PLINK2 recommend to combine multiple phenotypes to speedup process; however, PLINK2 assumes all the covariate in the covar file are used so if you have multiple DeepNull prediction in your COVARS_PLUS_DEEPNULL your GWAS have additional covariate that you may not need.

For example, when you run GWAS on pheno1 you want to use pheno1_deepnull_pred as covariates and not pheno2_deepnull_pred or pheno3_deepnull_pred. However, the your command will perform GWAS via PLINK2 for pheno1 while pheno1_deepnull_pred, pheno2_deepnull_pred, and pheno3_deepnull_pred are used as covariates.

Please let us know if this make sense.

Best, Farhad

ASLeonard commented 2 years ago

Ah thanks Farhad, that example is perfect. I didn't consider that the other j!=i deepnull covariates would not apply to the ith phenotype, so a single run would be problematic.

I'll apply them separately and be glad to have increased power rather than compute speed 😄