BigDataBiology / SemiBin

SemiBin: metagenomics binning with self-supervised deep learning
https://semibin.rtfd.io/
116 stars 10 forks source link

Long_read problem #138

Open ZarulHanifah opened 1 year ago

ZarulHanifah commented 1 year ago

Hello SemiBin developers,

Thank you for the software. It worked well with my nanopore data (R.9.4.1, Guppy v5, super accurate basecalling configuration), until I tried --sequencing-type=long_read. Any ideas?

Command:

SemiBin single_easy_bin -i results/proovframe/assem.fasta \
 -b results/minimap2.bam \
--depth-metabat2 results/depth.tsv \
-r /home/mzar0002/pg32_scratch/db/SemiBin_db \
--environment soil \
--sequencing-type=long_read \
-o results/binning/semibin \
-p 8 &> results/log/semibin/log.log

Log:

/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/long_read_cluster.py:77: RuntimeWarning: divide by zero encountered in log
  embedding_new = np.concatenate((embedding, np.log(depth)), axis=1)
Traceback (most recent call last):
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/bin/SemiBin", line 10, in <module>
    sys.exit(main1())
             ^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1482, in main1
    main2(args, is_semibin2=False)
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1455, in main2
    single_easy_binning(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1183, in single_easy_binning
    binning_long(**binning_kwargs)
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1061, in binning_long
    cluster_long_read(model,
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/long_read_cluster.py", line 101, in cluster_long_read
    dist_matrix = kneighbors_graph(
                  ^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_graph.py", line 122, in kneighbors_graph
    ).fit(X)
      ^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_unsupervised.py", line 178, in fit
    return self._fit(X)
           ^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_base.py", line 498, in _fit
    X = self._validate_data(X, accept_sparse="csr", order="C")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/base.py", line 604, in _validate_data
    out = check_array(X, input_name="X", **check_params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 959, in check_array
    _assert_all_finite(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 124, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 173, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input X contains infinity or a value too large for dtype('float32')./fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/long_read_cluster.py:77: RuntimeWarning: divide by zero encountered in log
  embedding_new = np.concatenate((embedding, np.log(depth)), axis=1)
Traceback (most recent call last):
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/bin/SemiBin", line 10, in <module>
    sys.exit(main1())
             ^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1482, in main1
    main2(args, is_semibin2=False)
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1455, in main2
    single_easy_binning(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1183, in single_easy_binning
    binning_long(**binning_kwargs)
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/main.py", line 1061, in binning_long
    cluster_long_read(model,
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/SemiBin/long_read_cluster.py", line 101, in cluster_long_read
    dist_matrix = kneighbors_graph(
                  ^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_graph.py", line 122, in kneighbors_graph
    ).fit(X)
      ^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_unsupervised.py", line 178, in fit
    return self._fit(X)
           ^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/neighbors/_base.py", line 498, in _fit
    X = self._validate_data(X, accept_sparse="csr", order="C")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/base.py", line 604, in _validate_data
    out = check_array(X, input_name="X", **check_params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 959, in check_array
    _assert_all_finite(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 124, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/fs03/ie79/Zarul/status_nanopore/C002_D1/.snakemake/conda/4555c0c8960801d84920076de87e12a0_/lib/python3.11/site-packages/sklearn/utils/validation.py", line 173, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input X contains infinity or a value too large for dtype('float32').
psj1997 commented 1 year ago

Sorry for the late reply.

It seems there is a very big number that can not be represented by 'float32'. Can you check the biggest value in the depth column of the data.csv file? Thanks!

Sincerely Shaojun