Merge individual TWAS NAc gene weights

lcolladotor commented 4 years ago

This involves https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/twas/compute_weights.R and https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/twas/compute_weights.sh.

compute_weights.sh

Make similar edits to this script like you did for build_bims.sh in #3.

compute_weights.R

Might involve some similar edits to the ones from build_bims.R in #3 like at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L30-L36 and https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L39. Editing this script in theory should require the least amount of effort since it's a "reduction" (map reduce) step, though maybe the RangedSummarizedExperiment object has the Gencode gene IDs in a different location than what is expected in https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L91-L100.

This step is faster to run and like in #3 and #4, feel free to test it even if the expression data hasn't been cleaned by qSVA yet.

aseyedia commented 4 years ago

Can't execute the script because it relies on an .Rdata files generated in #3, which itself requires PC files.

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L42-L44

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L53

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/build_bims.R#L263-L265

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/build_bims.R#L284-L287

I can advance on this script once we resolve the issues in build_bims.R.

aseyedia commented 4 years ago

Now that the build_bims.R workflow is finally complete (thanks to Leo), I was able to spend some time with this script today.

Immediately I noticed that the parallel function output_status returns only a list of ~57k FALSE vector elements:

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/86ceecbd22252e3021136dc5884e3235de6086d7/twas/compute_weights.R#L50-L60

This is because it checks whether there are weight files for each of the genes in the out_files/ directory, which doesn't exist and wasn't created by build_bims.R or any previous step. Additionally, it calls for an object named filt_bim to represent a directory which is to be deleted in the event that the parameter clean_bim is set to TRUE when no such object exists

A problem is then encountered when a list of the weights weights is written to file and then submitted to FUSION.profile_wgt.R.

https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/86ceecbd22252e3021136dc5884e3235de6086d7/twas/compute_weights.R#L75-L81

I will keep looking into this but upon first glance, it appears that the script is incomplete.

aseyedia commented 4 years ago

I'm now comparing the bsp2 compute_weights.R to this one, and they seem considerably different.

https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/compute_weights.R

Maybe this will answer some of my questions.

lcolladotor commented 4 years ago

https://github.com/LieberInstitute/twas/blob/master/bsp2/compute_weights.R is the most relevant script for comparing compute_weights.R. The one you linked to https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/compute_weights.R is much older.

lcolladotor commented 4 years ago

https://github.com/LieberInstitute/twas/blob/master/bsp2/compute_weights_indv.sh#L28 is where out_files gets created. Under the twas repo (and the one we are adapting to the NAc data), the order is:

build_bims
compute_weights_indv
compute_weights

lcolladotor commented 4 years ago

That is, before this issue, https://github.com/LieberInstitute/10xPilot_snRNAseq-human/issues/4 comes up first (adapting compute_weights_indv)

aseyedia commented 4 years ago

I see, I must have missed that step falsely believing that I had already covered it.

aseyedia commented 4 years ago

Error-free run. About 40% of genes were included in the computation.

> table(output_status)
output_status
FALSE  TRUE
33702 23186

Output from FUSION.profile_wgt.R can be found here: /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/NAc_gene/NAc_genes.profile*

Other files output: NAc_genes.list NAc_genes.pos pos_info.Rdata output_status.Rdata

LieberInstitute / 10xPilot_snRNAseq-human

Merge individual TWAS NAc gene weights #5

compute_weights.sh

compute_weights.R