anuradhawick / LRBinner

LRBinner is a long-read binning tool published in WABI 2021 proceedings and AMB.
https://doi.org/10.4230/LIPIcs.WABI.2021.11
GNU General Public License v2.0
29 stars 5 forks source link

No bins from test data #3

Closed cazzlewazzle89 closed 2 years ago

cazzlewazzle89 commented 2 years ago

Hi @anuradhawick

I have been trying the test data provided but I am not getting any bins. I just wanted to check that I am not doing something incorrectly? The terminal output pasted below. Here is a link to the [log file] (https://drive.google.com/file/d/1g5cRypbTszdwdOW3hCfwYzXZi8V33Rh0/view?usp=sharing).

Any feedback would be greatly appreciated, Calum

2021-11-17 09:54:43,493 - INFO - Command /home/cwwalsh/Software/LRBinner/LRBinner reads -r reads.fasta -o lrb_output/ --ae-epochs 200 --resume -mbs 1000 -bit 0 -bs 10 -bc 10
2021-11-17 09:54:43,493 - INFO - Resuming the program from previous checkpoints
2021-11-17 09:54:43,494 - INFO - Counting k-mers
INPUT FILE reads.fasta
OUTPUT FILE lrb_output//profiles/com_profs
K_SIZE 3
THREADS 8
Profile Size 32
Total 3-mers 64
Loaded Reads 57128       
2021-11-17 09:54:46,851 - INFO - Counting k-mers complete
2021-11-17 09:54:46,852 - INFO - Counting 15-mers
INPUT FILE reads.fasta
OUTPUT FILE lrb_output//profiles/15mers-counts
THREADS 8
Loaded Reads 57128       
WRITING TO FILE
COMPLETED : Output at - lrb_output//profiles/15mers-counts
2021-11-17 09:55:05,069 - INFO - Counting 15-mers complete
2021-11-17 09:55:05,069 - INFO - Computing 15-mer profiles
K-Mer file lrb_output//profiles/15mers-counts
LOADING KMERS TO RAM
FINISHED LOADING KMERS TO RAM 
INPUT FILE reads.fasta
OUTPUT FILE lrb_output//profiles/cov_profs
THREADS 8
BIN WIDTH 10
BINS IN HIST 10
Loaded Reads 57128       
COMPLETED : Output at - lrb_output//profiles/cov_profs
2021-11-17 09:55:10,799 - INFO - Computing 15-mer profiles complete
2021-11-17 09:55:10,799 - INFO - Profiles saving as numpy arrays
2021-11-17 09:55:11,471 - INFO - Profiles saving as numpy arrays complete
2021-11-17 09:55:11,472 - INFO - VAE training information
2021-11-17 09:55:11,472 - INFO -        Dimensions 8
2021-11-17 09:55:11,472 - INFO -        Hidden Layers [128, 128]
2021-11-17 09:55:11,472 - INFO -        Epochs 200
Training VAE: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [05:16<00:00,  1.58s/it]
2021-11-17 10:00:28,678 - INFO - VAE training complete
2021-11-17 10:00:28,683 - INFO - Clustering algorithm running
Performing iterations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 253.28it/s]
2021-11-17 10:00:28,716 - INFO - Detected 0
2021-11-17 10:00:28,716 - INFO - Detected 0 clusters with more than 1000 points
2021-11-17 10:00:28,716 - INFO - Building profiles
2021-11-17 10:00:28,728 - INFO - Binning unclassified reads
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57128/57128 [00:00<00:00, 1758647.46it/s]
2021-11-17 10:00:28,761 - INFO - Binning complete with 0 bins
Traceback (most recent call last):
  File "/home/cwwalsh/Software/LRBinner/LRBinner", line 197, in <module>
    main()
  File "/home/cwwalsh/Software/LRBinner/LRBinner", line 182, in main
    pipelines.run_reads_binning(args)
  File "/home/cwwalsh/Software/LRBinner/mbcclr_utils/pipelines.py", line 367, in run_reads_binning
    cluster_utils.perform_binning(
  File "/home/cwwalsh/Software/LRBinner/mbcclr_utils/cluster_utils.py", line 347, in perform_binning
    binout.write(f"{read_bin[r]}\n")
KeyError: 0
anuradhawick commented 2 years ago

Hi @cazzlewazzle89 ,

Thanks for the issue. I have made a bug in -bit 0, which runs just 1 iteration instead of many. I have fixed the bug. Extremely sorry about this.

I have also added a dataset from the paper instead of the toy dataset. I hope you'll use that dataset. Command to use is also updated.

Another suggestion is to try --ae-dims with 4 and 8 which seems to significantly impact the number of bins. To make the program run faster if you want to run multiple times please add --cuda to use NVIDIA GPUs and --resume to avoid repeated steps.

You could also have a look at OBLR which also has a Google Colab Notebook to test things around and adapt to suit your work.

cazzlewazzle89 commented 2 years ago

Thanks @anuradhawick for the quick fix!

Looks like it is working now on the test data (see output from eval.py below for reference) I'm initially interested in using the contigs function to bin metagenomic contigs from human faecal samples but I might try it on the unassembled reads and let you know how it performs.

All the best, Calum

Precision            96.47
Recall               98.32
F1-Score             97.38
Bins                     9