anuradhawick / LRBinner

LRBinner is a long-read binning tool published in WABI 2021 proceedings and AMB.
https://doi.org/10.4230/LIPIcs.WABI.2021.11
GNU General Public License v2.0
29 stars 5 forks source link

error when counting 15-mers #12

Closed lewhiteside closed 1 year ago

lewhiteside commented 1 year ago

Hi there,

Im running into an error when I run the following contigs produced using metaflye:

python LRBinner contigs --reads-path highQuality-reads_barcode01.fastq --contigs sample1_assembly.fasta --output LRBinner_Output_sample1

The output:

2022-11-28 16:07:58,051 - INFO - Command LRBinner contigs --reads-path highQuality-reads_barcode01.fastq --contigs sample1_assembly.fasta --output LRBinner_Output_sample1
2022-11-28 16:07:58,054 - INFO - Computing contig lengths
2022-11-28 16:07:58,204 - INFO - Searching for marker genes
2022-11-28 16:07:58,204 - DEBUG - Using marker genes from /mnt/scratch2/users/40266190/LRBinner/auxiliary/marker.hmm
2022-11-28 16:07:58,204 - DEBUG - FragGeneScan cmd: run_FragGeneScan.pl -genome="sample1_assembly.fasta" -out="LRBinner_Output_sample1/marker_genes/contigs.frag"             -complete=0 -train=complete -thread=8 1> "LRBinner_Output_sample1/marker_genes/contigs.frag.out" 2>                 "LRBinner_Output_sample1/marker_genes//contigs.frag.err" 
2022-11-28 16:07:59,292 - DEBUG - HMMER cmd: hmmsearch --domtblout "LRBinner_Output_sample1/marker_genes//contigs.hmmout" --cut_tc --cpu 8                 "/mnt/scratch2/users/40266190/LRBinner/auxiliary/marker.hmm" "LRBinner_Output_sample1/marker_genes//contigs.frag.faa" 1> "LRBinner_Output_sample1/marker_genes//contigs.hmmout.out" 2> "LRBinner_Output_sample1/marker_genes//contigs.hmmout.err" 
2022-11-28 16:08:03,355 - INFO - Searching for marker genes complete
2022-11-28 16:08:03,355 - INFO - Splitting contigs
2022-11-28 16:08:03,384 - INFO - Splitting contigs completed
2022-11-28 16:08:03,384 - INFO - Counting 15-mers
2022-11-28 16:08:03,384 - DEBUG - CMD::"/mnt/scratch2/users/40266190/LRBinner/mbcclr_utils/bin/count-15mers" "highQuality-reads_barcode01.fastq" "LRBinner_Output_sample1/profiles/15mers-counts" 8
2022-11-28 16:08:03,396 - ERROR - Error in step: Counting 15-mers
2022-11-28 16:08:03,396 - ERROR - Failed due to an error. Please check the log. Good Bye!

The log file produces:

2022-11-28 16:07:58,051 - INFO - Command LRBinner contigs --reads-path highQuality-reads_barcode01.fastq --contigs sample1_assembly.fasta --output LRBinner_Output_sample1
2022-11-28 16:07:58,054 - INFO - Computing contig lengths
2022-11-28 16:07:58,204 - INFO - Searching for marker genes
2022-11-28 16:07:58,204 - DEBUG - Using marker genes from /mnt/scratch2/users/40266190/LRBinner/auxiliary/marker.hmm
2022-11-28 16:07:58,204 - DEBUG - FragGeneScan cmd: run_FragGeneScan.pl -genome="sample1_assembly.fasta" -out="LRBinner_Output_sample1/marker_genes/contigs.frag"             -complete=0 -train=complete -thread=8 1> "LRBinner_Output_sample1/marker_genes/contigs.frag.out" 2>                 "LRBinner_Output_sample1/marker_genes//contigs.frag.err" 
2022-11-28 16:07:59,292 - DEBUG - HMMER cmd: hmmsearch --domtblout "LRBinner_Output_sample1/marker_genes//contigs.hmmout" --cut_tc --cpu 8                 "/mnt/scratch2/users/40266190/LRBinner/auxiliary/marker.hmm" "LRBinner_Output_sample1/marker_genes//contigs.frag.faa" 1> "LRBinner_Output_sample1/marker_genes//contigs.hmmout.out" 2> "LRBinner_Output_sample1/marker_genes//contigs.hmmout.err" 
2022-11-28 16:08:03,355 - INFO - Searching for marker genes complete
2022-11-28 16:08:03,355 - INFO - Splitting contigs
2022-11-28 16:08:03,384 - INFO - Splitting contigs completed
2022-11-28 16:08:03,384 - INFO - Counting 15-mers
2022-11-28 16:08:03,384 - DEBUG - CMD::"/mnt/scratch2/users/40266190/LRBinner/mbcclr_utils/bin/count-15mers" "highQuality-reads_barcode01.fastq" "LRBinner_Output_sample1/profiles/15mers-counts" 8
2022-11-28 16:08:03,396 - ERROR - Error in step: Counting 15-mers
2022-11-28 16:08:03,396 - ERROR - Failed due to an error. Please check the log. Good Bye!

Any help would be appreciated,

Thank you in advance!

Louise

anuradhawick commented 1 year ago

Hi,

Thanks for the comment.

Until I dig deep, could you confirm if you build the binaries using sh build.sh command?

I'll try to reproduce this with some of the data I have got.

lewhiteside commented 1 year ago

Thank you for your response. I built the binaries as you suggested (I had done this the other way!) - It is now counting 15-mers, but I get a new error:

2022-12-07 13:13:17,915 - INFO - Command LRBinner contigs --reads-path highQuality-reads_barcode01.fastq --contigs sample1_assembly.fasta --output LRBinner_Output_sample1
2022-12-07 13:13:17,929 - INFO - Computing contig lengths
2022-12-07 13:13:18,433 - INFO - Searching for marker genes
2022-12-07 13:13:18,526 - INFO - Searching for marker genes complete
2022-12-07 13:13:18,526 - INFO - Splitting contigs
2022-12-07 13:13:18,604 - INFO - Splitting contigs completed
2022-12-07 13:13:18,604 - INFO - Counting 15-mers
INPUT FILE highQuality-reads_barcode01.fastq
OUTPUT FILE LRBinner_Output_sample1/profiles/15mers-counts
THREADS 8
Loaded Reads 58266
WRITING TO FILE
COMPLETED : Output at - LRBinner_Output_sample1/profiles/15mers-counts
2022-12-07 13:13:28,659 - INFO - Counting 15-mers complete
2022-12-07 13:13:28,660 - INFO - Computing k-mer vectors
INPUT FILE LRBinner_Output_sample1/fragments/contigs.fasta
OUTPUT FILE LRBinner_Output_sample1/profiles/com_profs
K_SIZE 3
THREADS 8
Profile Size 32
Total 3-mers 64
Loaded Reads 411
2022-12-07 13:13:28,716 - INFO - Computing k-mer vectors complete
2022-12-07 13:13:28,716 - INFO - Generating coverage vectors
K-Mer file LRBinner_Output_sample1/profiles/15mers-counts
LOADING KMERS TO RAM
FINISHED LOADING KMERS TO RAM
INPUT FILE LRBinner_Output_sample1/fragments/contigs.fasta
OUTPUT FILE LRBinner_Output_sample1/profiles/cov_profs
THREADS 8
BIN WIDTH 10
BINS IN HIST 32
Loaded Reads 411
COMPLETED : Output at - LRBinner_Output_sample1/profiles/cov_profs
2022-12-07 13:13:33,028 - INFO - Generating coverage vectors complete
2022-12-07 13:13:33,028 - INFO - Profiles saving as numpy arrays
2022-12-07 13:13:33,063 - INFO - Profiles saving as numpy arrays complete
2022-12-07 13:13:33,065 - INFO - VAE training information
2022-12-07 13:13:33,065 - INFO -        Dimensions 8
2022-12-07 13:13:33,065 - INFO -        Hidden Layers [128, 128]
2022-12-07 13:13:33,065 - INFO -        Epochs 200
2022-12-07 13:13:33,065 - INFO - Contig split must link pairs            0
2022-12-07 13:13:33,066 - INFO - Single copy marker genes pairs          0
Training VAE: 100%|██████████████████████████████████████████████████████████████████████████████████| 200/200 [00:04<00:00, 45.44it/s]
2022-12-07 13:13:37,744 - INFO - VAE training complete
Traceback (most recent call last):
  File "LRBinner", line 197, in <module>
    main()
  File "LRBinner", line 179, in main
    pipelines.run_contig_binning(args)
  File "/mnt/scratch2/users/40266190/LRBinner/mbcclr_utils/pipelines.py", line 243, in run_contig_binning
    output, fragment_parent, separate, contigs, threads)
  File "/mnt/scratch2/users/40266190/LRBinner/mbcclr_utils/cluster_utils.py", line 489, in perform_contig_binning_HDBSCAN
    from hdbscan import HDBSCAN
  File "/mnt/scratch2/users/40266190/conda/envs/lrbinner/lib/python3.7/site-packages/hdbscan/__init__.py", line 1, in <module>
    from .hdbscan_ import HDBSCAN, hdbscan
  File "/mnt/scratch2/users/40266190/conda/envs/lrbinner/lib/python3.7/site-packages/hdbscan/hdbscan_.py", line 509, in <module>
    memory=Memory(cachedir=None, verbose=0),
TypeError: __init__() got an unexpected keyword argument 'cachedir'

Thank you for your help!

lewhiteside commented 1 year ago

Any updates on this error would be greatly appreciated

Thank you

anuradhawick commented 1 year ago

Hello,

I’ll get back to you soon after holidays. Thanks

anuradhawick commented 1 year ago

Hi @lewhiteside,

Thanks for the issue. This was due to an old version of HDBSCAN. Could you please follow my response to issue - #14

Cheers, Anuradha