bcgsc / abyss

:microscope: Assemble large genomes using short reads
http://www.bcgsc.ca/platform/bioinfo/software/abyss
Other
307 stars 106 forks source link

Can't seem to build ABYSS-P #307

Closed nschiraldi closed 4 years ago

nschiraldi commented 4 years ago

I'm having trying getting ABYSS-P to build with MPI support. We use module files on our HPC to load openmpi. Here is what I am doing:

$ export CXX=mpic++
$ export CC=mpicc
$ module load openmpi/3.1.4

$ ../configure --prefix=/network/rit/misc/software/abyss-2.2.3/ CPPFLAGS='-I /network/rit/misc/software/sparsehash/include/' --with-sqlite=/usr/bin/sqlite3 --with-mpi=/usr/lib64/openmpi/
$ make && make install

When I try to run:

abyss-pe -C k$k np=${SLURM_NTASKS} name=bowfin k=$k in='${PWD}/musket_bowfin_F.trimmed.fq ${PWD}/musket_bowfin_R.trimmed.fq'

/opt/openmpi-v3.1/3.1.4/bin/mpirun -np 12 ABYSS-P -k50 -q3    --coverage-hist=coverage.hist -s bowfin-bubbles.fa  -o bowfin-1.fa /network/rit/home/sb939359/turnerlab/spencer/bowfin_project/bowfin_ref_archive/abyss/musket_bowfin_F.trimmed.fq /network/rit/home/sb939359/turnerlab/spencer/bowfin_project/bowfin_ref_archive/abyss/musket_bowfin_R.trimmed.fq

--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command
      line parameter option (remember that mpirun interprets the first
      unrecognized command line token as the executable).

Node:       uagc12-02
Executable: ABYSS-P
--------------------------------------------------------------------------
2 total processes failed to start
make: *** [bowfin-1.fa] Error 134
make: Leaving directory `/network/rit/lab/turnerlab/spencer/bowfin_project/bowfin_ref_archive/abyss/k50'
$ ldd ./ABYSS-P
        not a dynamic executable

Please report

nschiraldi commented 4 years ago

I'm not sure why, but abyss-pe executes ABYSS-P instead of ABYSS. I sym linked ABYSS-P to ABYSS and the code is executing fine now, but this feels like it might be a bug, or pointing to something I'm doing wrong.

jwcodee commented 4 years ago

@nschiraldi I'm a bit confused, the build was not fully built and you still executed the code? As for the compilation error I would suggest compiling with gcc and g++. We do not have integration tests with mpic++.

nschiraldi commented 4 years ago

My understanding is that ABYSS-P seems to be a script which just echos a message and doesn't do anything? After some tries rebuilding, ABYSS-P eventually did not get built (because it looks like it shouldn't be built if mpilib is found). Once I successfully built ABYSS with openmpi, the call to abyss-pe still was generating this command:

/opt/openmpi-v3.1/3.1.4/bin/mpirun -np 12 ABYSS-P -k50 -q3    --coverage-hist=coverage.hist -s bowfin-bubbles.fa  -o bowfin-1.fa /network/rit/home/sb939359/turnerlab/spencer/bowfin_project/bowfin_ref_archive/abyss/musket_bowfin_F.trimmed.fq /network/rit/home/sb939359/turnerlab/spencer/bowfin_project/bowfin_ref_archive/abyss/musket_bowfin_R.trimmed.fq

Once I sym linked ABYSS-P to ABYSS, everything runs fine, I'm not sure why abyss-pe is determining the executable to be ABYSS-P.

jwcodee commented 4 years ago
%-1.fa:
    $(gtime) $(stack) abyss-bloom-dbg $(abyssopt) $(ABYSS_OPTIONS) $(in) $(se) > $@
else ifdef K

ifdef np
%-1.fa:
    $(gtime) $(mpirun) -np $(np) abyss-paired-dbg-mpi $(abyssopt) $(ABYSS_OPTIONS) -o $*-1.fa $(in) $(se)
else
%-1.fa %-1.$g:
    $(gtime) abyss-paired-dbg $(abyssopt) $(ABYSS_OPTIONS) -o $*-1.fa -g $*-1.$g $(in) $(se)
endif

else ifdef np
%-1.fa:
    $(gtime) $(mpirun) -np $(np) ABYSS-P $(abyssopt) $(ABYSS_OPTIONS) -o $@ $(in) $(se)
else
%-1.fa:
    $(gtime) ABYSS $(abyssopt) $(ABYSS_OPTIONS) -o $@ $(in) $(se)
endif

Based on this, it is executing ABYSS-P correctly. In the readme it says To run ABySS with 8 threads, use abyss-pe np=8. The abyss-pe driver script will start the MPI process, like so: mpirun -np 8 ABYSS-P., so the execution of ABYSS-P is correct

nschiraldi commented 4 years ago

ABYSS-P is simply

#!/bin/bash

echo >&2 "error: ABySS was not configured with support for MPI. For details"
echo >&2 "       on compiling ABySS with MPI suppor see:"
echo >&2 "       https://github.com/bcgsc/abyss#compiling-abyss-from-source"
exit 1

Whereas, ABYSS np=2 [args] executes the mpirun fine.

jwcodee commented 4 years ago
/usr/bin/abyss-2.2.3/bin/ABYSS-P --help                                                                                                                               
Usage: ABYSS -k<kmer> -o<output.fa> [OPTION]... FILE...                                                                                                                                                            
Assemble the input files, FILE, which may be in FASTA, FASTQ,                                                                                                                                                      
qseq, export, SAM or BAM format and compressed with gz, bz2 or xz.                                                                                                                                                 

 Options:                                                                                                                                                                                                          

      --chastity        discard unchaste reads [default]
      --no-chastity     do not discard unchaste reads
      --trim-masked     trim masked bases from the ends of reads
                        [default]
      --no-trim-masked  do not trim masked bases from the ends of
                        reads
  -q, --trim-quality=N  trim bases from the ends of reads whose
                        quality is less than the threshold
  -Q, --mask-quality=N  mask all low quality bases as `N'
  --standard-quality    zero quality is `!' (33)
                        default for FASTQ and SAM files
  --illumina-quality    zero quality is `@' (64)
                        default for qseq and export files
      --SS              assemble in strand-specific mode
      --no-SS           do not assemble in strand-specific mode
  -o, --out=FILE        write the contigs to FILE

I'm seeing a binary file

nschiraldi commented 4 years ago

It looks like ABYSS-P is just a cp of ABYSS, so I'm not that concerned about having to symlink. I'm okay closing this issue if you are.

jwcodee commented 4 years ago

They are not the same file.

$ll -h /usr/bin/ABYSS
-rwxr-xr-x 1 jowong btl 3.4M Sep 26 09:14/usr/bin/ABYSS
$ ll -h/usr/bin/ABYSS-P
-rwxr-xr-x 1 jowong btl 4.3M Sep 26 09:14 /usr/bin/ABYSS-P

ABYSS-P is specific to mpirun

ewarnke commented 4 years ago

mpicc and mpic++ are just wrappers around gcc which inject the include/library paths.

$ mpicc --version
gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
nschiraldi commented 4 years ago

That explains why ABYSS is being built with MPI support and ABYSS-P is not needed for this install solution... I'm going to close this issue.

jwcodee commented 4 years ago

Hmm. That is quite strange then. We have a CI for gcc-4 all the way to gcc-9 but we haven't gotten any compilation error.

sjackman commented 4 years ago

ABYSS and ABYSS-P are very different executables. The former is not parallelized, and the latter is parallelized using OpenMPI. ABySS 1 required OpenMPI for parallelization. ABySS 2 uses OpenMP for parallelization. You need to specify the Bloom filter parameters B, H, and kc to use the OpenMP parallelization. See https://github.com/bcgsc/abyss#assembling-using-a-bloom-filter-de-bruijn-graph You want to compile ABySS using g++ and not mpic++.