BigDataBiology / SemiBin

SemiBin: metagenomics binning with self-supervised deep learning
https://semibin.rtfd.io/
115 stars 10 forks source link

KeyError #133

Closed Sumsarium closed 11 months ago

Sumsarium commented 1 year ago

Hi. I keep getting the same KeyError (same error on three different HPCs).

My cmd:

conda activate SemiBin
SemiBin \
single_easy_bin \
-i $metagenome \
--input-bam $temp/$assemblyName/*.bam \
--output $temp/$assemblyName/semibin \
--sequencing-type=long_read

The error:

2023-05-01 23:49:58 XXX SemiBin[2555720] INFO Training model and clustering.
2023-05-01 23:49:58 XXX SemiBin[2555720] INFO Start training from one sample.
2023-05-01 23:50:00 XXX SemiBin[2555720] INFO Training model...
  0%|                                                                                                                         | 0/15 [29:31<?, ?it/s]
Traceback (most recent call last):
  File "/home/XXX/miniconda3/envs/SemiBin/bin/SemiBin", line 10, in <module>
    sys.exit(main1())
  File "/home/XXX/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1482, in main1
    main2(args, is_semibin2=False)
  File "/home/XXX/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1455, in main2
    single_easy_binning(
  File "/home/XXX/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1151, in single_easy_binning
    training(logger, [args.contig_fasta],
  File "/home/XXX/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 990, in training
    model = train(
  File "/home/XXX/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/semi_supervised_model.py", line 218, in train
    train_input_1.append(train_data_input[contig2ix[str(link[0])]])
KeyError: '1298'

Any thoughts?

psj1997 commented 1 year ago

Hi

Thanks for your attention of SemiBin.

Do you mind sharing the contig id in cannot.txt and contig file? It seems they are inconsistent.

Another way is that maybe you can try: conda activate SemiBin SemiBin \ single_easy_bin \ -i $metagenome \ --input-bam $temp/$assemblyName/*.bam \ --output $temp/$assemblyName/semibin \ --sequencing-type=long_read --self-supervised

Sincerely Shaojun

luispedro commented 1 year ago

@Sumsarium Did you find a solution? Or could you perhaps share some of your inputs with Shaojun so we could check ourselves?

Sumsarium commented 1 year ago

@luispedro @psj1997 Thanks and sorry for the overdue reply - I plan to look into it later this week. I will let you know as soon as I have some info / files.

Sumsarium commented 1 year ago

Trieded the other approach: SemiBin \ single_easy_bin \ -i $metagenome \ --input-bam $temp/$assemblyName/*.bam \ --output $temp/$assemblyName/semibin \ --sequencing-type=long_read --self-supervised

but got this error:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/SemiBin/bin/SemiBin", line 10, in <module>
    sys.exit(main1())
  File "/home/user/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1482, in main1
    main2(args, is_semibin2=False)
  File "/home/user/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1455, in main2
    single_easy_binning(
  File "/home/user/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1159, in single_easy_binning
    training(logger, None,
  File "/home/user/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/main.py", line 1007, in training
    model = train_self(output,
  File "/home/user/miniconda3/envs/SemiBin/lib/python3.9/site-packages/SemiBin/self_supervised_model.py", line 87, in train_self
    indices1 = np.random.choice(data_length, size=n_samples)
  File "mtrand.pyx", line 928, in numpy.random.mtrand.RandomState.choice
ValueError: a must be greater than 0 unless no samples are taken

Different error though. The question is whether the problem is the underlying data (small metagenome with 6 bam files) or the env. I will attempt to set up a singularity-based workflow and test on a different dataset.

psj1997 commented 1 year ago

Hi, can you check the data.csv file in the output_dir? It seems it is a empty file.

Sincerely Shaojun

luispedro commented 11 months ago

We just released version 2, which includes a more error checking and better diagnostics; so I am closing this here, but please feel free to re-open if it still relevant.