Open gaow opened 2 years ago
RE RSS version of workflows: before we have some progress on #153, perhaps the best way to run SuSiE RSS is to do it through polyfun::finemapper.py
that wraps susie_rss
after summary stats merger with reference genotype (in PLINK format) used for LD.
mvsusie_rss
is trickier. I dont think we need to run it anyways at this point, for it's still unpublished method. We'll run mvsusie
though using the input format I posted earlier above. Once we make progress on #153, we can get per gene summary stats easily with VCF related tools, and compute LD on the fly given genomic region; then feed that to mvsusie_rss
For per gene phenotype, I think it is best to save the per gene res_Y for each study and then merged them as need arise. I think this is more flexible as to what studies got analysis together down the road.
We need to save per gene data for workflows with SuSiE, mvSuSiE, MR-MASH and FUSION.
Currently we have two ways to save it
X
, multi-condition phenotypeY
, residual phenotypeY-res
, summary statsb
ands
, and LD matrixR
.bed
formatI propose to merge them and adjust SuSiE etc pipeline accordingly
Per-gene data storage
X
in PLINK formatY
as a matrix with rows being the samples inX
. There will be missing data in thisY
matrix per condition, in gz formatY-res
the same was as 2 above. We may already have some functions in susieR package to do that; if not I'll create one based on our currentresidual_Y
module. Save it in gz formatFUSION analysis
Take 1 and 3 from above. We need to modify the FAM file on the fly to put in the Y-res from 3.
SuSiE/mvSuSiE
read_plink
in R to get genotype XsusieR
to fill missing data in X by mean imputation, to center and standardize itsusieR
to take covariates and regress covariates out of X'sY-res
for analysisSummary statistics and LD
For integrative analysis we should not attempt to deal with merging summary statistics (sumstats) and LD in any individual module. They are non-trivial. I proposed it #153. We should not save LD per analysis unit, if there is a good way to generate it on the fly.
For sumstats itself,
gwas_vcf
R package?