Closed lsterck closed 5 years ago
Hi @lsterck
When I run transabyss
like so, ABYSS
is run instead of ABYSS-P
(ie. mpi is not used), eg.
transabyss --pe reads_1.fq.gz reads_2.fq.gz --threads 12 -k 25 -c 1 ...
Are you running transabyss on a computing cluster?
Hi @kmnip ,
here is my cmdline:
transabyss --threads $NSLOTS --mpi 0 --pe $1 $2 -k 32 --name runk32test
where $NSLOTS is the #cores I request when submitting it to the cluster with qsub -pe
so, yes, on a compute cluster but on a single node though
if I request 1 core it runs OK (==no mpi starts) but as soon as I ask more it starts ABYSS-P, eg:
transabyss --threads 5 --mpi 0 --pe ERR2040229_1.fastq ERR2040229_2.fastq -k 32 --name runk32test
I see the following in the log:
# CPU(s) available: 24
# thread(s) requested: 5
# thread(s) to use: 5
....
openmpi/1.8.6/bin/mpirun -np 5 ABYSS-P -k32 -q3 -e2 -E0 -c2 ....
Thanks for the details.
Trans-ABySS launches ABySS with abyss-pe
. I think abyss-pe
was trying to integrate into your cluster environment and set the np
parameter automatically based on the environment variables for the number of cores. When the np
parameter is set within abyss-pe
and mpirun
is run...
Can you try to unset
the following environment variable (for your scheduler) before running transabyss
?
unset $PBS_NODEFILE
unset $NSLOTS
unset $LSB_DJOB_NUMPROC
unset $LOADL_HOSTFILE
SGE (like kinda thing) ;-)
ok, trying it
Since you mentioned NSLOTS
, this is the problematic part within abyss-pe
that needs to be bypassed:
# Integrate with Sun Grid Engine (SGE)
ifdef JOB_NAME
name?=$(JOB_NAME)
endif
ifdef SGE_TASK_ID
k?=$(SGE_TASK_ID)
endif
ifdef NSLOTS
ifneq ($(NSLOTS), 1)
np?=$(NSLOTS)
endif
endif
# Integrate with Portable Batch System (PBS)
ifdef PBS_JOBNAME
name?=$(PBS_JOBNAME)
endif
ifdef PBS_ARRAYID
k?=$(PBS_ARRAYID)
endif
ifdef PBS_NODEFILE
NSLOTS=$(shell wc -l <$(PBS_NODEFILE))
ifneq ($(NSLOTS), 1)
np?=$(NSLOTS)
endif
endif
If you are using SGE, then here's a clunky solution:
threads=$NSLOTS
unset $NSLOTS
transabyss --threads $threads --pe $1 $2 -k 32 --name runk32test
You don't need to specify --mpi 0
if you are not using MPI at all.
OK, so yes that seems to work but then I can't benefit from the NSLOTS set by qsub.
ahah, excellent, nice catch !! and thanks for the snippet how to work around this. works like a charm
thx!!
small 'bonus' question: is that because this is the way how one would normally check if there is mpi available, I mean checking if $NSLOTS is set? or is it simply a little hack ? ;)
Glad to hear that it works! :)
No, NSLOTS
is used within abyss-pe
to set the number of MPI processes according to the allocated resources for your cluster job.
Can Trans-ABySS use ABySS 2.0 with Bloom filters? I ask because enabling Bloom filter assembly mode (by setting B
and kc
) would have the side effect of using OpenMP rather than MPI.
See http://bcgsc.github.io/abyss/#assembling-using-a-bloom-filter-de-bruijn-graph
Yes, but Trans-ABySS does some graph-pruning based on the median k-mer coverage of the unitigs. If I recall correctly, the k-mer coverage isn't accurate for the current implementation of Bloom filter DBG in ABySS. Am I correct?
On Tue., May 21, 2019, 11:04 a.m. Shaun Jackman, notifications@github.com wrote:
Can Trans-ABySS use ABySS 2.0 with Bloom filters? I ask because enabling Bloom filter assembly mode (by setting B and kc) would have the side effect of using OpenMP rather than MPI. See http://bcgsc.github.io/abyss/#assembling-using-a-bloom-filter-de-bruijn-graph
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bcgsc/transabyss/issues/18?email_source=notifications&email_token=ABJFILXOIX427UFW64CMGA3PWQ2TBA5CNFSM4HN7ZCKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4WZ4Y#issuecomment-494497011, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFILVIACVGXHJTNDLVA33PWQ2TBANCNFSM4HN7ZCKA .
Good point. Yes, that's correct. (note that ABySS computes the mean k-mer coverage, not median)
(note that ABySS computes the mean k-mer coverage, not median)
More accurately it computes the sum k-mer depth of coverage, from which you can get the mean by dividing by the number of k-mers, but you cannot get the median.
Yes, you are right. I got mixed up with RNA-Bloom!
On Tue., May 21, 2019, 11:31 a.m. Shaun Jackman, notifications@github.com wrote:
(note that ABySS computes the mean k-mer coverage, not median)
More accurately it computes the sum k-mer depth of coverage, from which you can get the mean by dividing by the number of k-mers, but you cannot get the median.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bcgsc/transabyss/issues/18?email_source=notifications&email_token=ABJFILS4HOMUV2AA5DGIYO3PWQ5ZPA5CNFSM4HN7ZCKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4ZEGY#issuecomment-494506523, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFILTSZVKR6LEF2IDRNNLPWQ5ZPANCNFSM4HN7ZCKA .
Thanks for the additional insights both.
I was just checking whether I (me, my lab) are abusing this $NSLOTS parameter. In our current cluster setting no MPI runs are "allowed" over nodes, which to me (in my simplistic view) comes down to only multi-threaded things on a single node. So we only ever use the $NSLOTS to detemine how many threads can be used and never for mpi config.
anyway, likely not worth spending much more time on ;-)
Hi,
I want to run transabyss v 2.0.1 without mpi but I do not seem to be able to do so. Even when I add the option
--mpi 0
it still starts the ABYSS-P with mpi .is there perhaps some other tick to get the desired behaviour? I do want to run it multi-threaded but not mpi.
thx