bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
394 stars 191 forks source link

Binning with single end reads #32

Open palomo11 opened 6 years ago

palomo11 commented 6 years ago

Dear German,

I tried to run the binning module with several single end reads but it gave an error.

Is there any possibility to run this module with single end reads?

ursky commented 6 years ago

Single-end reads are becoming somewhat uncommon, but its not terribly difficult to use in metaWRAP. This is somewhat experimental, but I added single-end read support for the binning module in metaWRAP v=0.9 with the --single-end option. Please report back if it works for you.

palomo11 commented 6 years ago

Here is the command:

metawrap binning -a ./assembly_megahit/final.contigs.fa -o ./assembly_megahit/binning -t 56 --metabat2 --maxbin2 --concoct --single-end Sample1_single_1.fastq Sample2_single_1.fastq Sample3_single_1.fastq Sample4_single_1.fastq Sample5_single_1.fastq Sample5_single_1.fastq Sample6_single_1.fastq Sample7_single_1.fastq Sample8_single_1.fastq Sample9_single_1.fastq

It seems metaba2 works kind of fine (although it recovered few genomes) but nothing else worked.

This was in the error file:

/home/people/name/.conda/envs/metawrap-env/bin/metawrap-modules/binning.sh: line 400: syntax error: unexpected end of file

real    35m7.801s
user    462m7.747s
sys 38m24.761s
Traceback (most recent call last):
  File "/home/people/name/.conda/envs/metawrap-env/bin/metawrap-scripts/fix_config_naming.py", line 4, in <module>
    for line in open(sys.argv[1]):
IOError: [Errno 2] No such file or directory: './megahit_assembly/binning/bin_refinement/binsB/*'
mv: cannot move './megahit_assembly/binning/bin_refinement/tmp.fa' to './megahit_assembly/binning/bin_refinement/binsB/*': No such file or directory
Traceback (most recent call last):
  File "/home/people/name/.conda/envs/metawrap-env/bin/metawrap-scripts/fix_config_naming.py", line 4, in <module>
    for line in open(sys.argv[1]):
IOError: [Errno 2] No such file or directory: './megahit_assembly/binning/bin_refinement/binsC/*'
mv: cannot move './megahit_assembly/binning/bin_refinement/tmp.fa' to './megahit_assembly/binning/bin_refinement/binsC/*': No such file or directory
ursky commented 6 years ago

Wow, sorry about that, I somehow deleted a block of text in the middle of the module, breaking the whole thing. I fixed it, and updated the metawrap v=0.9 release. Can you uninstall metawrap, reinstall it again (to make sure you get the update), and try again?

palomo11 commented 6 years ago

I uninstalled metawrap and reinstalled it but it happended the same. And also with a pair end metagenome:

/home/people/name/.conda/envs/metawrap-env/bin/metawrap-modules/binning.sh: line 400: syntax error: unexpected end of file

real    111m37.242s
user    1506m23.166s
sys 104m10.051s
Traceback (most recent call last):
  File "/home/people/name/.conda/envs/metawrap-env/bin/metawrap-scripts/fix_config_naming.py", line 4, in <module>
    for line in open(sys.argv[1]):
IOError: [Errno 2] No such file or directory: './assembly2_megahit/binning/bin_refinement/binsB/*'
mv: cannot move './assembly2_megahit/binning/bin_refinement/tmp.fa' to './assembly2_megahit/binning/bin_refinement/binsB/*': No such file or directory
Traceback (most recent call last):
  File "/home/people/name/.conda/envs/metawrap-env/bin/metawrap-scripts/fix_config_naming.py", line 4, in <module>
    for line in open(sys.argv[1]):
IOError: [Errno 2] No such file or directory: './assembly2_megahit/binning/bin_refinement/binsC/*'
mv: cannot move './assembly2_megahit/binning/bin_refinement/tmp.fa' to './assembly2_megahit/binning/bin_refinement/binsC/*': No such file or directory
ursky commented 6 years ago

I think what might have happened is that conda cashed the old version of the metawrap conda package, so when you reinstalled it didn't actually download the new version since the name did not change. Maybe uninstall and then manually delete the metawrap package downloads from miniconda2/pkgs? Or maybe there is an option in conda to ignore cashed files...

palomo11 commented 6 years ago

OK, I have uninstalled and delete everything from miniconda/pkgs. Then I have installed again.

Now metaba2 and maxbin2 have worked fine, but it has been a problem with concoct:

readline() on closed filehandle FILE at /home/people/alpal/.conda/envs/metawrap-env/bin/run_MaxBin.pl line 1335.
readline() on closed filehandle FILE at /home/people/alpal/.conda/envs/metawrap-env/bin/run_MaxBin.pl line 1335.
Output depth matrix to ./Soil_Svalbard_merged_megahit/binning/work_files/tmp
Calculating intra contig depth variance
Output matrix to ./Soil_Svalbard_merged_megahit/binning/work_files/tmp
Opening 10 bams
Consolidating headers
Processing bam files
Thread 4 finished: mgm4691058.3_TM_single_1.bam with 18531267 reads and 278300 readsWellMapped
Thread 9 finished: mgm4691071.3_TM_single_1.bam with 18411441 reads and 633629 readsWellMapped
Thread 8 finished: mgm4691070.3_TM_single_1.bam with 18910229 reads and 234662 readsWellMapped
Thread 0 finished: mgm4691039.3_TM_single_1.bam with 21241158 reads and 381511 readsWellMapped
Thread 6 finished: mgm4691063.3_TM_single_1.bam with 21441812 reads and 165391 readsWellMapped
Thread 1 finished: mgm4691045.3_TM_single_1.bam with 21804996 reads and 292938 readsWellMapped
Thread 3 finished: mgm4691057.3_TM_single_1.bam with 21963637 reads and 250442 readsWellMapped
Thread 5 finished: mgm4691059.3_TM_single_1.bam with 22168459 reads and 397004 readsWellMapped
Thread 7 finished: mgm4691064.3_TM_single_1.bam with 22690792 reads and 319522 readsWellMapped
Thread 2 finished: mgm4691054.3_TM_single_1.bam with 22734017 reads and 219238 readsWellMapped
Creating depth matrix file: ./Soil_Svalbard_merged_megahit/binning/work_files/tmp
Closing most bam files
Closing last bam file
Finished
Traceback (most recent call last):
  File "/home/people/alpal/.conda/envs/metawrap-env/bin/concoct", line 4, in <module>
    __import__('pkg_resources').run_script('concoct==0.4.0', 'concoct')
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 654, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1434, in run_script
    exec(code, namespace, namespace)
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/concoct-0.4.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/concoct", line 10, in <module>
    from concoct.cluster import cluster
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/concoct-0.4.0-py2.7-linux-x86_64.egg/concoct/cluster.py", line 4, in <module>
    from sklearn.mixture import GMM
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/mixture/gmm.py", line 27, in <module>
    from .. import cluster
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/cluster/spectral.py", line 17, in <module>
    from ..manifold import spectral_embedding
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/manifold/__init__.py", line 6, in <module>
    from .isomap import Isomap
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/manifold/isomap.py", line 11, in <module>
    from ..decomposition import KernelPCA
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/decomposition/__init__.py", line 11, in <module>
    from .sparse_pca import SparsePCA, MiniBatchSparsePCA
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/decomposition/sparse_pca.py", line 11, in <module>
    from ..linear_model import ridge_regression
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/linear_model/__init__.py", line 26, in <module>
    from .logistic import (LogisticRegression, LogisticRegressionCV,
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/linear_model/logistic.py", line 23, in <module>
    from ..svm.base import _fit_liblinear
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/svm/__init__.py", line 13, in <module>
    from .classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM, LinearSVC, \
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/svm/classes.py", line 4, in <module>
    from .base import _fit_liblinear, BaseSVC, BaseLibSVM
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/svm/base.py", line 8, in <module>
    from . import libsvm, liblinear
ImportError: /home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/matplotlib/../../../libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/sklearn/svm/libsvm.so)
ursky commented 6 years ago

Ok great, at least the binning part works. However looks like I introduced a GCC library issue when I added GCC as a dependency in issue #31... I took it out in metawrap v=0.9.1. Unfortunately, you will have to scrap your conda environment (if you use one) and reinstall everything from scratch in a newly made env to prevent library errors. Sorry about that. Can you reinstall everything and see if it works for you?

ursky commented 6 years ago

Also, you don't have to re-run the module to see if the error is fixed. First, run concoct to see if you can get a help message. If you can, then proceed.

ursky commented 6 years ago

Did the module end up working for you?

palomo11 commented 6 years ago

Hi sorry for late reply, I was on holidays.

Now it seems to work fine, hoever there is a problem when doing the bin refinement.

This is what I get in the bin_refinement folder:

1644 Jun 27 20:03 binsA 6470 Jun 27 20:03 binsB 3418 Jun 27 20:03 binsC 7124 Jun 27 21:04 binsAB 4340 Jun 27 21:04 binsABC 5588 Jun 27 21:04 binsAC 5396 Jun 27 21:04 binsBC 0 Jun 27 21:04 binsA.tmp

And here the error:

Traceback (most recent call last):
  File "/home/people/alpal/.conda/envs/metawrap-env/bin/checkm", line 36, in <module>
    from checkm import main
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/main.py", line 25, in <module>
    from checkm.defaultValues import DefaultValues
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 26, in <module>
    class DefaultValues():
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/defaultValues.py", line 29, in DefaultValues
    __DBM = DBManager()
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 114, in __init__
    if not self.setRoot():
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 177, in setRoot
    path = self.confirmPath(path=path)
  File "/home/people/alpal/.conda/envs/metawrap-env/lib/python2.7/site-packages/checkm/checkmData.py", line 199, in confirmPath
    path = raw_input("Where should CheckM store it's data?\n" \
EOFError: EOF when reading a line
########################################################################################################################
#####                                      RUNNING CHECKM ON ALL SETS OF BINS                                      #####
########################################################################################################################

------------------------------------------------------------------------------------------------------------------------
-----                                         Running CheckM on binsA bins                                         -----
------------------------------------------------------------------------------------------------------------------------

It seems that the CheckM data folder has not been set yet or has been removed. Running: 'checkm data setRoot'.
Where should CheckM store it's data?
Please specify a location or type 'abort' to stop trying: 

************************************************************************************************************************
*****                             Something went wrong with running CheckM. Exiting...                             *****
************************************************************************************************************************
ursky commented 6 years ago

It looks like you did not configure the CheckM database after reinstalling metawrap. See Database download instructions.

cahofman commented 5 years ago

Hi, I am also working with some single read data (merged quality filtered reads) but trying to run the assembly module. Is this possible?

Thanks!