bcgsc / transabyss

de novo assembly of RNA-seq data using ABySS
Other
33 stars 14 forks source link

disable running mpi #18

Closed lsterck closed 5 years ago

lsterck commented 5 years ago

Hi,

I want to run transabyss v 2.0.1 without mpi but I do not seem to be able to do so. Even when I add the option --mpi 0 it still starts the ABYSS-P with mpi .

is there perhaps some other tick to get the desired behaviour? I do want to run it multi-threaded but not mpi.

thx

kmnip commented 5 years ago

Hi @lsterck When I run transabyss like so, ABYSS is run instead of ABYSS-P (ie. mpi is not used), eg.

transabyss --pe reads_1.fq.gz reads_2.fq.gz --threads 12 -k 25 -c 1 ...

Are you running transabyss on a computing cluster?

lsterck commented 5 years ago

Hi @kmnip ,

here is my cmdline: transabyss --threads $NSLOTS --mpi 0 --pe $1 $2 -k 32 --name runk32test

where $NSLOTS is the #cores I request when submitting it to the cluster with qsub -pe

so, yes, on a compute cluster but on a single node though if I request 1 core it runs OK (==no mpi starts) but as soon as I ask more it starts ABYSS-P, eg: transabyss --threads 5 --mpi 0 --pe ERR2040229_1.fastq ERR2040229_2.fastq -k 32 --name runk32test I see the following in the log:

# CPU(s) available:     24
# thread(s) requested:  5
# thread(s) to use:     5
....
openmpi/1.8.6/bin/mpirun -np 5 ABYSS-P -k32 -q3 -e2 -E0 -c2    ....
kmnip commented 5 years ago

Thanks for the details. Trans-ABySS launches ABySS with abyss-pe. I think abyss-pe was trying to integrate into your cluster environment and set the np parameter automatically based on the environment variables for the number of cores. When the np parameter is set within abyss-pe and mpirun is run...

Can you try to unset the following environment variable (for your scheduler) before running transabyss?

lsterck commented 5 years ago

SGE (like kinda thing) ;-)

ok, trying it

kmnip commented 5 years ago

Since you mentioned NSLOTS, this is the problematic part within abyss-pe that needs to be bypassed:

# Integrate with Sun Grid Engine (SGE)
ifdef JOB_NAME
name?=$(JOB_NAME)
endif
ifdef SGE_TASK_ID
k?=$(SGE_TASK_ID)
endif
ifdef NSLOTS
ifneq ($(NSLOTS), 1)
np?=$(NSLOTS)
endif
endif

# Integrate with Portable Batch System (PBS)
ifdef PBS_JOBNAME
name?=$(PBS_JOBNAME)
endif
ifdef PBS_ARRAYID
k?=$(PBS_ARRAYID)
endif
ifdef PBS_NODEFILE
NSLOTS=$(shell wc -l <$(PBS_NODEFILE))
ifneq ($(NSLOTS), 1)
np?=$(NSLOTS)
endif
endif

If you are using SGE, then here's a clunky solution:

threads=$NSLOTS
unset $NSLOTS
transabyss --threads $threads --pe $1 $2 -k 32 --name runk32test

You don't need to specify --mpi 0 if you are not using MPI at all.

lsterck commented 5 years ago

OK, so yes that seems to work but then I can't benefit from the NSLOTS set by qsub.

ahah, excellent, nice catch !! and thanks for the snippet how to work around this. works like a charm

thx!!

lsterck commented 5 years ago

small 'bonus' question: is that because this is the way how one would normally check if there is mpi available, I mean checking if $NSLOTS is set? or is it simply a little hack ? ;)

kmnip commented 5 years ago

Glad to hear that it works! :)

No, NSLOTS is used within abyss-pe to set the number of MPI processes according to the allocated resources for your cluster job.

sjackman commented 5 years ago

Can Trans-ABySS use ABySS 2.0 with Bloom filters? I ask because enabling Bloom filter assembly mode (by setting B and kc) would have the side effect of using OpenMP rather than MPI. See http://bcgsc.github.io/abyss/#assembling-using-a-bloom-filter-de-bruijn-graph

kmnip commented 5 years ago

Yes, but Trans-ABySS does some graph-pruning based on the median k-mer coverage of the unitigs. If I recall correctly, the k-mer coverage isn't accurate for the current implementation of Bloom filter DBG in ABySS. Am I correct?

On Tue., May 21, 2019, 11:04 a.m. Shaun Jackman, notifications@github.com wrote:

Can Trans-ABySS use ABySS 2.0 with Bloom filters? I ask because enabling Bloom filter assembly mode (by setting B and kc) would have the side effect of using OpenMP rather than MPI. See http://bcgsc.github.io/abyss/#assembling-using-a-bloom-filter-de-bruijn-graph

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bcgsc/transabyss/issues/18?email_source=notifications&email_token=ABJFILXOIX427UFW64CMGA3PWQ2TBA5CNFSM4HN7ZCKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4WZ4Y#issuecomment-494497011, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFILVIACVGXHJTNDLVA33PWQ2TBANCNFSM4HN7ZCKA .

sjackman commented 5 years ago

Good point. Yes, that's correct. (note that ABySS computes the mean k-mer coverage, not median)

sjackman commented 5 years ago

(note that ABySS computes the mean k-mer coverage, not median)

More accurately it computes the sum k-mer depth of coverage, from which you can get the mean by dividing by the number of k-mers, but you cannot get the median.

kmnip commented 5 years ago

Yes, you are right. I got mixed up with RNA-Bloom!

On Tue., May 21, 2019, 11:31 a.m. Shaun Jackman, notifications@github.com wrote:

(note that ABySS computes the mean k-mer coverage, not median)

More accurately it computes the sum k-mer depth of coverage, from which you can get the mean by dividing by the number of k-mers, but you cannot get the median.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bcgsc/transabyss/issues/18?email_source=notifications&email_token=ABJFILS4HOMUV2AA5DGIYO3PWQ5ZPA5CNFSM4HN7ZCKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4ZEGY#issuecomment-494506523, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFILTSZVKR6LEF2IDRNNLPWQ5ZPANCNFSM4HN7ZCKA .

lsterck commented 5 years ago

Thanks for the additional insights both.

I was just checking whether I (me, my lab) are abusing this $NSLOTS parameter. In our current cluster setting no MPI runs are "allowed" over nodes, which to me (in my simplistic view) comes down to only multi-threaded things on a single node. So we only ever use the $NSLOTS to detemine how many threads can be used and never for mpi config.

anyway, likely not worth spending much more time on ;-)