BigDataBiology / SemiBin

SemiBin: metagenomics binning with self-supervised deep learning
https://semibin.rtfd.io/
115 stars 10 forks source link

Multi-binning got KeyError #149

Closed chenjh356 closed 7 months ago

chenjh356 commented 8 months ago

[2024-01-24 13:05:20,616] INFO: Setting number of CPUs to 48 [2024-01-24 13:05:20,616] INFO: Binning for short_read [2024-01-24 13:05:20,616] INFO: SemiBin will run in self supervised mode [2024-01-24 13:23:15,141] INFO: Did not detect GPU, using CPU. [2024-01-24 13:23:15,239] INFO: Performing multi-sample binning [2024-01-24 13:23:15,239] INFO: Generating training data... [2024-01-24 13:28:17,321] INFO: Calculating coverage for every sample. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/envs/semibin/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/envs/semibin/lib/python3.10/site-packages/SemiBin-2.0.2-py3.10.egg/SemiBin/generate_coverage.py", line 100, in generate_cov contig_cov, must_link_contig_cov = calculate_coverage( File "/envs/semibin/lib/python3.10/site-packages/SemiBin-2.0.2-py3.10.egg/SemiBin/generate_coverage.py", line 45, in calculate_coverage cov_threshold = contig_threshold_dict[sample_name] KeyError: 'k77_23424' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/envs/semibin/bin/SemiBin2", line 33, in sys.exit(load_entry_point('SemiBin==2.0.2', 'console_scripts', 'SemiBin2')()) File "/envs/semibin/lib/python3.10/site-packages/SemiBin-2.0.2-py3.10.egg/SemiBin/main.py", line 1494, in main2 multi_easy_binning( File "/envs/semibin/lib/python3.10/site-packages/SemiBin-2.0.2-py3.10.egg/SemiBin/main.py", line 1204, in multi_easy_binning sample_list = generate_sequence_features_multi( File "/envs/semibin/lib/python3.10/site-packages/SemiBin-2.0.2-py3.10.egg/SemiBin/main.py", line 876, in generate_sequence_features_multi s = r.get() File "/envs/semibin/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value KeyError: 'k77_23424'

psj1997 commented 8 months ago

Hi,

Can you show the data you have used and the binning command? Thanks!

chenjh356 commented 8 months ago

您好,您的邮件我已收到。我将尽快回复

chenjh356 commented 8 months ago

Hi,

Can you show the data you have used and the binning command? Thanks!

yes ! I found the error may occur for the sorted bam files not come from the concatenated.fa index but itself index. And I corrected it and it worked. Thanks!

chenjh356 commented 7 months ago

您好,您的邮件我已收到。我将尽快回复

Louis-MG commented 2 months ago

Hello ! I encountered the same error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin/generate_coverage.py", line 100, in generate_cov
    contig_cov, must_link_contig_cov = calculate_coverage(
                                       ^^^^^^^^^^^^^^^^^^^
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin/generate_coverage.py", line 45, in calculate_coverage
    cov_threshold = contig_threshold_dict[sample_name]
                    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'NODE_1_length_314220_cov_67.980583'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/guelou01/miniconda3/envs/SemiBin/bin/SemiBin2", line 10, in <module>
    sys.exit(main2())
             ^^^^^^^
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin/main.py", line 1610, in main2
    multi_easy_binning(
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin/main.py", line 1313, in multi_easy_binning
    sample_list = generate_sequence_features_multi(logger, args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/site-packages/SemiBin/main.py", line 964, in generate_sequence_features_multi
    s = r.get()
        ^^^^^^^
  File "/home/guelou01/miniconda3/envs/SemiBin/lib/python3.12/multiprocessing/pool.py", line 774, in get
    raise self._value
KeyError: 'NODE_1_length_314220_cov_67.980583'
(SemiBin) guelou01@vls132:/mnt/scratch/LM$ ^C
(SemiBin) guelou01@vls132:/mnt/scratch/LM$ zgrep -F 'NODE_1_length_314220_cov_67.980583' mags_post_chir/concatenated.fa.gz 
>DRR171461_scaffolds:NODE_1_length_314220_cov_67.980583
^C
(SemiBin) guelou01@vls132:/mnt/scratch/LM$ zgrep -F '^>.*NODE_1_length_314220_cov_67.980583' mags_post_chir/concatenated.fa.gz 
^C
(SemiBin) guelou01@vls132:/mnt/scratch/LM$ zgrep '^>.*NODE_1_length_314220_cov_67.980583' mags_post_chir/concatenated.fa.gz 
>DRR171461_scaffolds:NODE_1_length_314220_cov_67.980583

But I cannot figure out for the life of me how @chenjh356 corrected it. Could we have the full solution here please ? Especially if it means changing the documentation of SemiBin to avoid future issues.

chenjh356 commented 2 months ago

您好,您的邮件我已收到。我将尽快回复