Closed lcolladotor closed 4 years ago
Can't execute the script because it relies on an .Rdata
files generated in #3, which itself requires PC files.
I can advance on this script once we resolve the issues in build_bims.R
.
Now that the build_bims.R
workflow is finally complete (thanks to Leo), I was able to spend some time with this script today.
Immediately I noticed that the parallel function output_status
returns only a list of ~57k FALSE
vector elements:
This is because it checks whether there are weight files for each of the genes in the out_files/
directory, which doesn't exist and wasn't created by build_bims.R
or any previous step. Additionally, it calls for an object named filt_bim
to represent a directory which is to be deleted in the event that the parameter clean_bim
is set to TRUE
when no such object exists
A problem is then encountered when a list of the weights weights is written to file and then submitted to FUSION.profile_wgt.R
.
I will keep looking into this but upon first glance, it appears that the script is incomplete.
I'm now comparing the bsp2 compute_weights.R
to this one, and they seem considerably different.
https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/compute_weights.R
Maybe this will answer some of my questions.
https://github.com/LieberInstitute/twas/blob/master/bsp2/compute_weights.R is the most relevant script for comparing compute_weights.R
. The one you linked to https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/compute_weights.R is much older.
https://github.com/LieberInstitute/twas/blob/master/bsp2/compute_weights_indv.sh#L28 is where out_files
gets created. Under the twas
repo (and the one we are adapting to the NAc data), the order is:
That is, before this issue, https://github.com/LieberInstitute/10xPilot_snRNAseq-human/issues/4 comes up first (adapting compute_weights_indv
)
I see, I must have missed that step falsely believing that I had already covered it.
Error-free run. About 40% of genes were included in the computation.
> table(output_status)
output_status
FALSE TRUE
33702 23186
Output from FUSION.profile_wgt.R
can be found here: /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/NAc_gene/NAc_genes.profile*
Other files output:
NAc_genes.list
NAc_genes.pos
pos_info.Rdata
output_status.Rdata
This involves https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/twas/compute_weights.R and https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/twas/compute_weights.sh.
compute_weights.sh
Make similar edits to this script like you did for build_bims.sh in #3.
compute_weights.R
Might involve some similar edits to the ones from build_bims.R in #3 like at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L30-L36 and https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L39. Editing this script in theory should require the least amount of effort since it's a "reduction" (map reduce) step, though maybe the RangedSummarizedExperiment object has the Gencode gene IDs in a different location than what is expected in https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/compute_weights.R#L91-L100.
This step is faster to run and like in #3 and #4, feel free to test it even if the expression data hasn't been cleaned by qSVA yet.