PacificBiosciences / FALCON_unzip

Making diploid assembly becomes common practice for genomic study
BSD 3-Clause Clear License
30 stars 18 forks source link

specify custom nproc for blasr #115

Open leleory opened 6 years ago

leleory commented 6 years ago

Dear Devolpers, When I run fc_unzip, blasr runs on 24 processor cores in the 0-phasing stage. Can you tell me how can I change the --nproc parameter for blasr? What happens is that when I submit the jobs to SGE the job is run on a sing core, but when it comes to blasr it runs on 24 cores causing overthreading which slows done the system as the jobs start to compete with other people's jobs.

The below snippet is from the all.log where --nproc is set to 24. That is what I would like to change. Thank you, Lel

2018-04-12 08:22:11,948 - pypeflow.tasks - INFO - gen_task(

BLASR

ctg_aln_out='blasr/{params.ctg_id}_sorted.bam' mkdir -p blasr time blasr {input.read_fasta} {input.ref_fasta} --noSplitSubreads --clipping subread --hitPolicy randombest --randomSeed 42 --bestn 1 --minPctIdentity 70.0 --minMatch 12 --nproc 24 --bam --out tmp_aln.bam

samtools view -bS tmp_aln.sam | samtools sort - {params.ctg_id}_sorted

samtools sort tmp_aln.bam -o ${{ctg_aln_out}} samtools index ${{ctg_aln_out}} rm tmp_aln.bam

bam_fn=${{ctg_aln_out}} fasta_fn={input.ref_fasta}

mroach-awri commented 6 years ago

I've had this issue as well, it's hard-coded in run_quiver.py and unzip.py; you can just modify it with a text editor.

leleory commented 6 years ago

As your indicate this is also a problem with quiver.

Your suggestion would be a useful hack. Unfortunately I do not have either unzip.py or run_quiver.py in my version what I have installed via install_unzip.sh.

In the version I have there are two .pyc files where nproc 24 is hardcoded. These are: lib/python2.7/site-packages/falcon_unzip/tasks/unzip.pyc lib/python2.7/site-packages/falcon_unzip/tasks/quiver.pyc I do not have the python source code and I am not sure how wise it is to modify the byte code directly. If possible I would want to avoid it.

I would expect that at job submission the specified number of cores should be the upper limit for both blasr and quiver. I wonder if these values cannot be parsed from the submission options would it be possible to set these parameters in the config file directly?

Without being able to set this parameter I am getting into situations like I only have 5 processes submitted to a 12 core node, but the total load on the node is 140.

mroach-awri commented 6 years ago

Hmm, in my version there are both the source and byte code files. Your best bet might be to modify the source code and reinstall. EDIT: actually you may be able to modify the source unzip.py/run_quiver.py scripts and drop it into the install directory. you might need to delete the old .pyc files and compile new ones but it could save having to reinstall.