I'm trying to bin a couple of assemblies with SemiBin2 (v2.1.0), using the single_easy_bin command. For some assemblies, the job finishes early and no bins are generated:
[2024-06-11 10:41:43,492] INFO: Setting number of CPUs to 64
[2024-06-11 10:41:43,492] INFO: Binning for short_read
[2024-06-11 10:41:43,495] INFO: SemiBin will run in self supervised mode
[2024-06-11 10:41:49,295] INFO: Did not detect GPU, using CPU.
[2024-06-11 10:42:01,482] INFO: Generating training data...
[2024-06-11 10:49:31,759] INFO: Calculating coverage for every sample.
[2024-06-11 11:21:08,692] INFO: Processed: mapping_binning/B_1.bam
[2024-06-11 11:21:08,694] INFO: Processed: mapping_binning/B_2.bam
[2024-06-11 11:21:57,939] INFO: Processed: mapping_binning/B_3.bam
[2024-06-11 11:22:34,630] INFO: Start training from a single sample.
[2024-06-11 11:22:42,504] INFO: Training model...
[2024-06-11 12:13:52,844] INFO: Training finished.
[2024-06-11 12:13:52,909] INFO: Start binning.
It seems that this only affects large assemblies, as the runs for small assemblies finished without an issue, while the large assemblies failed. The contigs in our assemblies are ≥1 kb and we mapped the reads with strobealign and sorted them. Everything we are doing is standard, except that we are running SemiBin2 through Apptainer.
I suspect that this might be an issue with memory that happens because there's too much data. We will try to run it again setting --min-len 2500, assuming that it will ignore shorter contigs and generate network inputs only for the contigs longer than the threshold. I will update the issue if there are any developments.
I'm trying to bin a couple of assemblies with SemiBin2 (v2.1.0), using the
single_easy_bin
command. For some assemblies, the job finishes early and no bins are generated:It seems that this only affects large assemblies, as the runs for small assemblies finished without an issue, while the large assemblies failed. The contigs in our assemblies are ≥1 kb and we mapped the reads with strobealign and sorted them. Everything we are doing is standard, except that we are running SemiBin2 through Apptainer.
I suspect that this might be an issue with memory that happens because there's too much data. We will try to run it again setting
--min-len 2500
, assuming that it will ignore shorter contigs and generate network inputs only for the contigs longer than the threshold. I will update the issue if there are any developments.This might be related to https://github.com/BigDataBiology/SemiBin/issues/150. I decided to open another issue because my jobs finished without generating any bins, instead of hanging indefinitely.