cancerit / cgpCaVEManWrapper

Reference implementation of CGP workflow for CaVEMan SNV analysis
http://cancerit.github.io/cgpCaVEManWrapper/
GNU Affero General Public License v3.0
6 stars 3 forks source link

if -index 1 is passed with -p flag only one split file is flagged #55

Open jsmedmar opened 3 years ago

jsmedmar commented 3 years ago

if -index 1 is passed with -p flag, only one split file is flagged:

$ caveman.pl --version
VERSION: 1.16.0

$ ls -lah /tmpCaveman/
Aug 11 20:12 .
Aug 11 20:12 ..
Aug 11 20:11 alg_bean
Aug 11 20:11 caveman.cfg.ini
Aug 11 20:11 cov_arr
Aug 11 20:12 logs
Aug 11 20:11 prob_arr
Aug 11 20:12 progress
Aug 11 20:11 readpos.2
Aug 11 20:11 results
Aug 11 20:11 splitList
Aug 11 20:11 splitList.2
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf
Aug 11 20:11 tumor_vs_normal.flagged.muts.vcf.1
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf.gz
Aug 11 20:12 tumor_vs_normal.flagged.muts.vcf.gz.tbi
Aug 11 20:11 tumor_vs_normal.muts.ids.vcf
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.1
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.2
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.3
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.4
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.5
Aug 11 20:11 tumor_vs_normal.muts.ids.vcfsplit.6
Aug 11 20:11 tumor_vs_normal.muts.vcf
Aug 11 20:11 tumor_vs_normal.no_analysis.bed
Aug 11 20:11 tumor_vs_normal.snps.ids.vcf
Aug 11 20:11 tumor_vs_normal.snps.vcf

I think -p flag should flag all regardless of -index since you can't pass multiple -i anyways

keiranmraine commented 3 years ago

That is expected behaviour, it works the same as mstep/estep. The intent is that you would submit one job per index so that they can be performed in parallel.

If you want to spread the load over a known number of jobs you can execute them with the -limit option, same as mstep/estep. Executing the following 4 commands in parallel will process all flagging indexes regardless of number of split files. Should one fail for run time/memory it will resume from the last incomplete split element.

caveman.pl -p flag -l 4 -i 1
caveman.pl -p flag -l 4 -i 2
caveman.pl -p flag -l 4 -i 3
caveman.pl -p flag -l 4 -i 4

If you want to run them all in a single thread you don't declare -i, but you still retain the resume functionallity.

jsmedmar commented 3 years ago

If I do this I get the following error: ERROR: based on reference and exclude option index must be between 1 and 1:

caveman.pl \
    -process flag \
    -threads 1 \
    -index 1 \
    -limit 2
    ...

caveman.pl \
    -process flag \
    -threads 1 \
    -index 2 \
    -limit 2
    ...

In this case only the first command works.

I think these are relevant lines:

https://github.com/cancerit/cgpCaVEManWrapper/blob/84f952a78ab9adb23d04a187e093ef8669f147fa/bin/caveman.pl#L75-L83

https://github.com/cancerit/cgpCaVEManWrapper/blob/84f952a78ab9adb23d04a187e093ef8669f147fa/bin/caveman.pl#L474-L488

It looks like the max amount of indices for flag is set to 1, so it does not allow to pass more than 1 index. I know if you don't specify -limit and -index, it does process it in parallel using multiple threads. But I think submitting separate jobs with -index is not working at the moment.

keiranmraine commented 3 years ago

Thanks for clarifying the issue. That is a bug. We don't use it like this internally so it's been missed.