bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
394 stars 190 forks source link

Concoct fails to run, but metabat and maxbin are fine #373

Open VibrantStarling opened 3 years ago

VibrantStarling commented 3 years ago

Hi I'm having an issue with running concoct. It fails to run and produces the following warning. I've also tried running it in the metaWRAP environment step by step according to concocts' instructions and it fails in the same way at the "concoct" stage with the symbol lookup error. Initially ran it with metabat and maxbin at the same time and got the same error.

Unsure how to address this as it seems like it might be an issue in the program itself but i don't really know. Any ideas?

I'm running metawrap 1.2.1 in debian on a remote server.

metawrap binning -o RiTSETSE-metawrap/INITIAL-BINNING/ -t 24 -a RiTSETSEmegahit/final.contigs.fa --concoct Tsetse-reads_1.fastq Tsetse-reads_2.fastq

-----                                           Entered read type: paired                                          -----

-----                                  1 forward and 1 reverse read files detected                                 -----

#####                                     ALIGNING READS TO MAKE COVERAGE FILES                                    #####

Warning: RiTSETSE-metawrap/INITIAL-BINNING/ already exists.
rm: cannot remove `RiTSETSE-metawrap/INITIAL-BINNING//*checkm': No such file or directory

-----                         Looks like the assembly file is already coppied. Skipping...                         -----

-----                       Looks like there is a index of the assembly already. Skipping...                       -----

-----                           skipping aligning Tsetse-reads reads to assembly because                           -----
-----                RiTSETSE-metawrap/INITIAL-BINNING//work_files/Tsetse-reads.bam already exists.                -----

#####                                                RUNNING CONCOCT                                               #####

-----                                       indexing .bam alignment files...                                       -----


-----                             cutting up contigs into 10kb fragments for CONCOCT...                            -----

-----                                    estimating contig fragment coverage...                                    -----

/pub28/helend/miniconda3/envs/metaWRAP-env/bin/concoct_coverage_table.py:48: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  df = pd.read_table(fh, header=None)
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
               [--composition_file COMPOSITION_FILE] [-c CLUSTERS]
               [-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
               [-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
               [-b BASENAME] [-s SEED] [-i ITERATIONS] [-e EPSILON]
               [--no_cov_normalization] [--no_total_coverage]
               [--no_original_data] [-o] [-d] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --coverage_file COVERAGE_FILE
                        specify the coverage file, containing a table where
                        each row correspond to a contig, and each column
                        correspond to a sample. The values are the average
                        coverage for this contig in that sample. All values
                        are separated with tabs.
  --composition_file COMPOSITION_FILE
                        specify the composition file, containing sequences in
                        fasta format. It is named the composition file since
                        it is used to calculate the kmer composition (the
                        genomic signature) of each contig.
  -c CLUSTERS, --clusters CLUSTERS
                        specify maximal number of clusters for VGMM, default
  -k KMER_LENGTH, --kmer_length KMER_LENGTH
                        specify kmer length, default 4.
  -t THREADS, --threads THREADS
                        Number of threads to use
                        specify the sequence length threshold, contigs shorter
                        than this value will not be included. Defaults to
  -r READ_LENGTH, --read_length READ_LENGTH
                        specify read length for coverage, default 100
  --total_percentage_pca TOTAL_PERCENTAGE_PCA
                        The percentage of variance explained by the principal
                        components for the combined data.
  -b BASENAME, --basename BASENAME
                        Specify the basename for files or directory where
                        outputwill be placed. Path to existing directory or
                        basenamewith a trailing '/' will be interpreted as a
                        directory.If not provided, current directory will be
  -s SEED, --seed SEED  Specify an integer to use as seed for clustering. 0
                        gives a random seed, 1 is the default seed and any
                        other positive integer can be used. Other values give
  -i ITERATIONS, --iterations ITERATIONS
                        Specify maximum number of iterations for the VBGMM.
                        Default value is 500
  -e EPSILON, --epsilon EPSILON
                        Specify the epsilon for VBGMM. Default value is 1.0e-6
                        By default the coverage is normalized with regards to
                        samples, then normalized with regards of contigs and
                        finally log transformed. By setting this flag you skip
                        the normalization and only do log transorm of the
  --no_total_coverage   By default, the total coverage is added as a new
                        column in the coverage data matrix, independently of
                        coverage normalization but previous to log
                        transformation. Use this tag to escape this behaviour.
  --no_original_data    By default the original data is saved to disk. For big
                        datasets, especially when a large k is used for
                        compositional data, this file can become very large.
                        Use this tag if you don't want to save the original
  -o, --converge_out    Write convergence info to files.
  -d, --debug           Debug parameters.
  -v, --version         show program's version number and exit

-----                                       Starting binning with CONCOCT...                                       -----

Up and running. Check /pub28/helend/SRA-RiTSETSE/RiTSETSE-metawrap/INITIAL-BINNING/work_files/concoct_out/log.txt for progress
/pub28/helend/miniconda3/envs/metaWRAP-env/lib/python2.7/site-packages/concoct/input.py:82: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  cov = p.read_table(cov_file, header=0, index_col=0)
/pub28/helend/miniconda3/envs/metaWRAP-env/bin/python: symbol lookup error: /pub28/helend/miniconda3/envs/metaWRAP-env/lib/python2.7/site-packages/../../libmkl_intel_thread.so: undefined symbol: omp_get_num_procs

*****                          Something went wrong with binning with CONCOCT. Exiting...                          *****

real    5m34.720s
user    6m18.510s
sys 0m11.796s
lauren-mak commented 3 years ago

I had the same issue and I think I've figured it out. Assuming you've installed the appropriate conda packages for CONCOCT, blas, mkl, and llvm-openmp, the issue is that libmkl_intel_thread.so can't find libiomp5.so in the environment's lib. This is because installing CONCOCT 1.1.0 downgraded llvm-openmp (12.0.1 to 8.?) which for some reason got rid of it. The hack is to symlink libiomp5.so to lib until there is a more permanent fix, like so:

ln -s ../anaconda3/pkgs/intel-openmp-2019.4-243/lib/libiomp5.so ../anaconda3/envs/binning/lib/libiomp5.so
lauren-mak commented 3 years ago

Update: After I added the symlink, CONCOCT works great on command line, but it still fails in metaWRAP. I've tried updating llvm-openmp and intel-openmp but still, same issue and I have no idea how to fix it.

shaodongyan commented 3 years ago

I have this problem too