bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
395 stars 190 forks source link

Errors in Kraken step, binning and bin_refinement #204

Open YiweiNiu opened 5 years ago

YiweiNiu commented 5 years ago

Hi,

Thanks for the cool tool!

I ran metawrap on 5 metagenome samples with the following commands. The version of metawrap is 1.2.1.

# kraken
metawrap kraken -o $WORKDIR/metawrap/$dataset/kraken -t $PPN -s 1000000 $WORKDIR/TrimGalore/$dataset/*.fastq $WORKDIR/megahit/$dataset/final.contigs.fa

# binning
metawrap binning -o $WORKDIR/metawrap/$dataset/INITIAL_BINNING -t $PPN -a $WORKDIR/megahit/$dataset/final.contigs.fa --metabat2 --maxbin2 --concoct $WORKDIR/TrimGalore/$dataset/*.fastq

# bin_refinement
metawrap bin_refinement -o $WORKDIR/metawrap/$dataset/BIN_REFINEMENT -t $PPN -A $WORKDIR/metawrap/$dataset/INITIAL_BINNING/metabat2_bins/ -B $WORKDIR/metawrap/$dataset/INITIAL_BINNING/maxbin2_bins/ -C $WORKDIR/metawrap/$dataset/INITIAL_BINNING/concoct_bins/ -c 50 -x 10

But I got several errors/warnings.

  1. Kraken. Comment lines in kraken.sh seemed to be the reasons for this error. After removing the comments in line 123 and line 124, kraken could be ran successfully. I am using CentOS 6.6 and running metawrap through a PBS cluster.
------------------------------------------------------------------------------------------------------------------------
-----                                                Now processing                                                -----
-----               /home/niuyw/Project/Data_processing/TrimGalore/META19SWWL15/S6-10-1_1.fastq and                -----
-----             /home/niuyw/Project/Data_processing/TrimGalore/META19SWWL15/S6-10-1_2.fastq with 24              -----
-----                                                   threads                                                    -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----                                     subsampling down to 1000000 reads...                                     -----
------------------------------------------------------------------------------------------------------------------------

/home/niuyw/software/anaconda2/envs/metawrap/bin/metawrap-modules/kraken.sh: line 123:  #combine: command not found
awk: (FILENAME=- FNR=26) fatal: printf to "standard output" failed (Broken pipe)
paste: write error: Broken pipe
paste: write error
/home/niuyw/software/anaconda2/envs/metawrap/bin/metawrap-modules/kraken.sh: line 124:  #shuffle: command not found
  1. MaxBin. I do not know whether this message matters, since usually the help message comes out when there is something wrong. But it seemed that MaxBin ran successfully from the logs.
------------------------------------------------------------------------------------------------------------------------
-----                    split master contig depth file into individual files for maxbin2 input                    -----
------------------------------------------------------------------------------------------------------------------------

processing S6-10-1.bam depth file...
processing S6-1-1.bam depth file...
processing S6-1-2.bam depth file...
processing S6-1-3.bam depth file...
processing S6-5-2.bam depth file...
MaxBin 2.2.6
No Contig file. Please specify contig file by -contig
MaxBin - a metagenomics binning software.
Usage:
  run_MaxBin.pl
    -contig (contig file)
    -out (output file)

   (Input reads and abundance information)
    [-reads (reads file) -reads2 (readsfile) -reads3 (readsfile) -reads4 ... ]
    [-abund (abundance file) -abund2 (abundfile) -abund3 (abundfile) -abund4 ... ]

   (You can also input lists consisting of reads and abundance files)
    [-reads_list (list of reads files)]
    [-abund_list (list of abundance files)]

   (Other parameters)
    [-min_contig_length (minimum contig length. Default 1000)]
    [-max_iteration (maximum Expectation-Maximization algorithm iteration number. Default 50)]
    [-thread (thread num; default 1)]
    [-prob_threshold (probability threshold for EM final classification. Default 0.9)]
    [-plotmarker]
    [-markerset (marker gene sets, 107 (default) or 40.  See README for more information.)]

  (for debug purpose)
    [-version] [-v] (print version number)
    [-verbose]
    [-preserve_intermediate]

  Please specify either -reads or -abund information.
  You can input multiple reads and/or abundance files at the same time.
  Please read README file for more details.

------------------------------------------------------------------------------------------------------------------------
-----                                       Starting binning with MaxBin2...
  1. CONCOCT: same as above. But it only produced 1 bin.
------------------------------------------------------------------------------------------------------------------------
-----                                    estimating contig fragment coverage...                                    -----
------------------------------------------------------------------------------------------------------------------------

/home/niuyw/software/anaconda2/envs/metawrap/bin/concoct_coverage_table.py:48: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  df = pd.read_table(fh, header=None)
usage: concoct [-h] [--coverage_file COVERAGE_FILE]
               [--composition_file COMPOSITION_FILE] [-c CLUSTERS]
               [-k KMER_LENGTH] [-t THREADS] [-l LENGTH_THRESHOLD]
               [-r READ_LENGTH] [--total_percentage_pca TOTAL_PERCENTAGE_PCA]
               [-b BASENAME] [-s SEED] [-i ITERATIONS] [-e EPSILON]
               [--no_cov_normalization] [--no_total_coverage]
               [--no_original_data] [-o] [-d] [-v]

optional arguments:
  -h, --help            show this help message and exit
  --coverage_file COVERAGE_FILE

...
  1. Plotting error like this one. There are 26 "good bins" in metawrap_50_10_bins.stats file. So, I do not know why this happened.
Loading completion info....
Plotting completion data...
Traceback (most recent call last):
  File "/home/niuyw/software/anaconda2/envs/metawrap/bin/metawrap-scripts/plot_binning_results.py", line 109, in <module>
    y_pos = data[bin_set][len(data[bin_set])*3/4]
IndexError: list index out of range
mv: cannot stat `binning_results.eps': No such file or directory
mv: cannot stat `binning_results.png': No such file or directory

Here is the full logs of metawrap: META19SWWL15_metawrap.txt

ursky commented 5 years ago
  1. Not sure why the read sub-sampling gave you an error. You could try classifying all the reads, or sub-sample yourself prior to the run.
  2. Everything is correct.
  3. Everything is correct.
  4. Are you sure all your resulting bin sets have >1 bins?
YiweiNiu commented 5 years ago

Thank you for your reply!

  1. I found 26 bins in file metawrap_50_10_bins.stats under the output directory of bin_refinement, and saw 'Re-evaluating bin quality after contig de-replication is complete! There are still 26 high quality bins' in the log. Sorry if it's my mistake.
$ ls
concoct_bins          maxbin2_bins          metabat2_bins.contigs        metawrap_50_10_bins.stats
concoct_bins.contigs  maxbin2_bins.contigs  metabat2_bins.stats          work_files
concoct_bins.stats    maxbin2_bins.stats    metawrap_50_10_bins
figures               metabat2_bins         metawrap_50_10_bins.contigs

$ wc -l metawrap_50_10_bins.stats 
27 metawrap_50_10_bins.stats

$ head -3 metawrap_50_10_bins.stats
bin completeness    contamination   GC  lineage N50 size    binner
bin.15  99.84   0.123   0.663   Comamonadaceae  189849  6692338 binsA
bin.25  99.77   0.0 0.655   Mycobacterium   133233  4268384 binsBC
  1. When proceeding to the following steps of metawrap, I got other errors due to one reason 'No such file or directory: metawrap_bins' like the one below. But I saw the message 'BIN_REFINEMENT PIPELINE FINISHED SUCCESSFULLY!' in the log.
-----                                      adding bin annotations to blobfile                                      -----
-----          /home/niuyw/Project/Data_processing/metawrap/META19SWWL15/BLOBOLOGY/final.contigs.blobplot          -----
------------------------------------------------------------------------------------------------------------------------

Traceback (most recent call last):
  File "/home/niuyw/software/anaconda2/envs/metawrap/bin/metawrap-scripts/add_bins_to_blobplot.py", line 9, in <module>
    for bin_file in os.listdir(sys.argv[2]):
OSError: [Errno 2] No such file or directory: '/home/niuyw/Project/Data_processing/metawrap/META19SWWL15/BIN_REFINEMENT/metawrap_bins'

************************************************************************************************************************
*****                     Something went wrong with annotating the blobplot by bins. Exiting...                    *****
************************************************************************************************************************

The metawrap version I used was 1.2.1. The script I used is metawrap.txt and the complete log is metawrap.log.txt.

It is the first time I dealt with metagenome data. I apologize if my questions are too obvious.

Thank you again for your help!

Bests, Yiwei Niu

lroppolo commented 4 years ago

Hi there,

I am also having an issue with the kraken.sh module, and I tried removing the comments as shown above. I didn't get any kind of resolution with that, do you have any suggestions? This is what I'm getting for an error:

(metawrap-env) [lroppolo@cph-i2 metawrap_UEGP]$ cat kraken.sh.e1559375 /users/lroppolo/.conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: line 123: #combine: command not found awk: cmd. line:1: (FILENAME=- FNR=28) fatal: printf to "standard output" failed (Broken pipe) paste: write error: Broken pipe paste: write error /users/lroppolo/.conda/envs/metawrap-env/bin/metawrap-modules/kraken.sh: line 124: #shuffle: command not found

Thank you as always for being so helpful!

ursky commented 4 years ago

Make sure you are running the latest version. I believe I patched this. Otherwise dont use the depth parameter and annotate all the reads to bypass this step.

lroppolo commented 4 years ago

Fantastic, thank you! I am running the latest version, so I'll remove the depth parameter. Do I set it to "none" instead of "all", or remove it from the script entirely? And to annotate all reads, this is something I'd need to do outside of the script? I'm sorry for so many questions, just want to make sure I get it right before proceeding.