bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
192 stars 53 forks source link

submit_callers_multiThreads.sh missing "split" in call to submit_JointSNVMix2.sh #86

Closed gianfilippo closed 4 years ago

gianfilippo commented 4 years ago

Hi, I believe there is a missing "split" option in the call to submit_JointSNVMix2.sh within the submit_callers_multiThreads.sh script (probably also in the submit_callers_singleThread.sh). Best

gianfilippo commented 4 years ago

I tried to add the "split section", taking it from SomaticSniper, given that they seem the same in both submit_XX.sh scripts. Unfortunately it crashes. Not sure what the problem is. For now I am removing the JointSNVMix2 call.

litaifang commented 4 years ago

I had SomaticSniper and JointSNVMix2 (that's a pretty old software now) outside the split/parallelization routine because these two tools do not take bed files as input, i.e., they can only read the whole bam file, and not use bed file to call mutations only specified in the regions. That's how we parallelize the mutation calling, i.e., we split the genome regions into small regions, represented by the bed files. So all the callers (except for those two) would call variants in parallel according to the bed files. Then we combine after they are finish. What did you modify?

gianfilippo commented 4 years ago

Hi, sorry, I did not explain it well. I meant the 'split' flag you have in the submit_JointSNVMix2.sh and submit_JointSomaticSniper.sh, which takes the final VCF and splits it consistently with the 1.bed,....N.bed, in each subDIR created for each THREAD. This is a piece of code that sits at the end of your submit_JointSNVMix2.sh script, after JointSNVMix2 processing

gianfilippo commented 4 years ago

This seems to be an issue only when running the generated .cmd script to split the variants in the various subDIRs created by the multi-threaded run. I am not fully sure on why this somaticseq.cmd script is necessary at this point, since for training the VCFs in each subDIR need to be combined in a single VCF (one each caller used). Anyway, unless I missed something, or solving this issue is important, please, feel free to close this issue. Thanks!

litaifang commented 4 years ago

I deprecated those .sh scripts and replaced them with the makeSomaticScripts.py command. For now it only works with tumor-normal workflow. To get singularity scripts, makeSomaticScripts.py paired -tech singularity .... Run makeSomaticScripts.py paired -h to see all the options.

gianfilippo commented 4 years ago

Thanks! Gianfilippo

On Tue, Jul 28, 2020 at 3:21 AM Li Tai Fang notifications@github.com wrote:

I deprecated those .sh scripts and replaced them with the makeSomaticScripts.py command. For now it only works with tumor-normal workflow. To get singularity scripts, makeSomaticScripts.py paired -tech singularity .... Run makeSomaticScripts.py paired -h to see all the options.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bioinform/somaticseq/issues/86#issuecomment-664825709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPSFVAHZCVWZ4ZXHT377ZDR5Z4BHANCNFSM4OZ4PUTQ .

litaifang commented 4 years ago

makeSomaticScripts.py now supports local execution of all the .cmd scripts in the right orders, by invoking --run-workflow-locally, though it's only supported in tumor-normal mode for now.