fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results
GNU General Public License v3.0
224 stars 44 forks source link

--work-on-disk skips steps #97

Closed nick-youngblut closed 2 years ago

nick-youngblut commented 2 years ago

krakenuniq-build died due to an out-of-memory error:

Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Finding all library files
Found 10000 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '31172786716'
K-mer set created. [2h55m52.083s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 31067325971 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 31067325971 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [7h34m38.290s]
Creating seqID to taxID map (step 4 of 6)..
1219382 sequences mapped to taxa. [52.317s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 401815 taxa
taxDB construction finished. [2.846s]
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Getting database0.kdb into memory (347.204 GB) ... Done
Loaded database with 31067325971 keys with k of 31 [val_len 4, key_len 8].
Reading sequence ID to taxonomy ID mapping ...  got 1219382 mappings.
Processed 681 s%

I then tried running krakenuniq-build --work-on-disk, and the job took ~5 seconds:

Kraken build set to minimize RAM usage.
Found 10000 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Creating taxDB (step 5 of 6)...
taxDB construction finished. [2.846s]
Building  KrakenUniq LCA database (step 6 of 6)...

...however, the job never generated the database.kdb output file. If I instead don't use --work-on-disk, krakenuniq-build seems to actually work on producing the database.kdb output:

Found jellyfish v1.1.12
Kraken build set to minimize disk writes.
Found 10000 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
Getting database0.kdb into memory (347.204 GB) ...

I'm using krakenuniq=0.6 due to https://github.com/fbreitwieser/krakenuniq/issues/95

nick-youngblut commented 2 years ago

As a test of reproducibility, I killed the krakenuniq-build job at the end of the above post (Getting database0.kdb into memory (347.204 GB) ...), and I instead tried used krakenuniq-build --work-on-disk again to make sure that it would generate the same output as above:

Kraken build set to minimize RAM usage.
Found 10000 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Creating taxDB (step 5 of 6)...
taxDB construction finished. [2.846s]
Building  KrakenUniq LCA database (step 6 of 6)...

...however, krakenuniq-build --work-on-disk instead produced the following output:

Found jellyfish v1.1.12
Kraken build set to minimize RAM usage.
Found 10000 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
You need to operate in RAM (flag -M) to use output to a different file (flag -o)
xargs: cat: terminated by signal 13
nick-youngblut commented 2 years ago

I get the same error with krakenuniq=0.6 when starting a krakenuniq-build job on a new library (using --work-on-disk):

Found jellyfish v1.1.12
Kraken build set to minimize RAM usage.
Finding all library files
Found 500 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using jellyfish
Hash size not specified, using '1637986465'
K-mer set created. [8m10.272s]
Skipping step 2, no database reduction requested.
Sorting k-mer set (step 3 of 6)...
db_sort: Getting database into memory ...Loaded database with 1623560677 keys with k of 31 [val_len 4, key_len 8].
Loaded database with 1623560677 keys with k of 31 [val_len 4, key_len 8].
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...
K-mer set sorted. [22m46.406s]
Creating seqID to taxID map (step 4 of 6)..
61039 sequences mapped to taxa. [3.395s]
Creating taxDB (step 5 of 6)...
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 401815 taxa
taxDB construction finished. [3.468s]
Building  KrakenUniq LCA database (step 6 of 6)...
Reading taxonomy index from taxDB. Done.
You need to operate in RAM (flag -M) to use output to a different file (flag -o)
xargs: cat: terminated by signal 13
alekseyzimin commented 2 years ago

Please check --work-on-disk option in the latest release v0.7.3, it should work properly now.

nick-youngblut commented 2 years ago

With v0.7.3, I'm still getting the error described at https://github.com/fbreitwieser/krakenuniq/issues/52. My build directory includes:

database-build.log
database.jdb
database0.kdb
database_0
database_1
library/
library-files.txt
seqid2taxid-plus.map
seqid2taxid.map
taxDB
taxonomy/
alekseyzimin commented 2 years ago

What is your command? I would like to reproduce the error.

On Thu, Jun 23, 2022 at 8:48 AM Nick Youngblut @.***> wrote:

I get the following error when using --work-on-disk with v0.7.3:

Kraken build set to minimize RAM usage. Found 500 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Skipping step 1, k-mer set already exists. Skipping step 2, no database reduction requested. Skipping step 3, k-mer set already sorted. Skipping step 4, seqID to taxID map already complete. Skipping step 5, taxDB exists. Building KrakenUniq LCA database (step 6 of 6)... Reading taxonomy index from taxDB. Done. Loaded database with 1623560677 keys with k of 31 [val_len 4, key_len 8]. set_lcas: unable to open database.idx: No such file or directory xargs: cat: terminated by signal 13

Not such error occurs if I don't use --work-on-disk:

Kraken build set to minimize disk writes. Found 500 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Skipping step 1, k-mer set already exists. Skipping step 2, no database reduction requested. Skipping step 3, k-mer set already sorted. Skipping step 4, seqID to taxID map already complete. Skipping step 5, taxDB exists. Building KrakenUniq LCA database (step 6 of 6)... Reading taxonomy index from taxDB. Done. Getting database0.kdb into memory (18.145 GB) ...

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164366615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHNHEPINEJMJAC4EJKTVQRMI5ANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

nick-youngblut commented 2 years ago

A simple ./krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB, with $DB denoting the database base directory path.

Using --rebuild does not help (just checked again)

alekseyzimin commented 2 years ago

The command I have been using to test was:

krakenuniq-build --db . --threads 32 --work-on-disk

I have library and taxonomy folders in the current dir. I will test with library and taxonomy in another folder

On Thu, Jun 23, 2022 at 8:55 AM Nick Youngblut @.***> wrote:

A simple ./krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB, with $DB denoting the database base directory path.

Using --rebuild does not help (just checked again)

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164374283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHL3RGJOLVQKL4RDQULVQRNE5ANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

nick-youngblut commented 2 years ago

I tried krakenuniq-build --db . --threads 32 --work-on-disk in the appropriate directory, but I still got the same error.

Maybe it's due to how I'm adding genomes to the library? My simple helper script for that:

#!/usr/bin/env python
from __future__ import print_function
import os
import sys
import re
import gzip
import bz2
import argparse
import logging

# logging
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.DEBUG)

# argparse
class CustomFormatter(argparse.ArgumentDefaultsHelpFormatter,
                      argparse.RawDescriptionHelpFormatter):
    pass

desc = 'Adding genome to krakenuniq database'
epi = """DESCRIPTION:
Write output files to db_dir:
* renamed genome fasta (all special characters removed from names)
* krakenuniq map file
"""
parser = argparse.ArgumentParser(description=desc, epilog=epi,
                                 formatter_class=CustomFormatter)
parser.add_argument('fasta_file', type=str,
                    help='Input genome fasta file')
parser.add_argument('taxid', type=str,
                    help='Taxonomy ID for the genome')
parser.add_argument('sample', type=str,
                    help='Genome name')
parser.add_argument('db_dir', type=str,
                    help='Output database location (e.g., ku_db/library/)')
parser.add_argument('--version', action='version', version='0.0.1')

# functions
def _open(infile, mode='rb'):
    """
    Openning of input, regardless of compression
    """
    if infile.endswith('.bz2'):
        return bz2.open(infile, mode)
    elif infile.endswith('.gz'):
        return gzip.open(infile, mode)
    else:
        return open(infile)

def copy_genome(infile, outdir, sample):
    outfile = os.path.join(outdir, sample + '.fna')
    regex = re.compile(r'[^>A-Za-z0-9-\n]')
    gz = infile.endswith('.gz')
    contigs = list()
    with _open(infile) as inF, open(outfile, 'w') as outF:
        for line in inF:
            if gz:
                line = line.decode('utf-8')
            # seq header
            if line.startswith('>'):
                line = regex.sub('_', line)
                contigs.append(line.lstrip('>').rstrip())
            # writing to output directory
            outF.write(line)
    logging.info(f'File written: {outfile}')
    # return
    return contigs

def write_map(contigs, outdir, sample, taxid):
    outfile = os.path.join(outdir, sample + '.map')
    with open(outfile, 'w') as outF:
        for contig in contigs:
            outF.write('\t'.join([contig, taxid, sample]) + '\n')
    logging.info(f'File written: {outfile}')

## main interface function
def main(args):
    if not os.path.isdir(args.db_dir):
        os.makedirs(args.db_dir)
    contigs = copy_genome(args.fasta_file, args.db_dir, args.sample)
    write_map(contigs, args.db_dir, args.sample, args.taxid)

## script main
if __name__ == '__main__':
    args = parser.parse_args()
    main(args)
alekseyzimin commented 2 years ago

It is possible. The command worked fine for me just now, see below.

@.** test_krakenuniq]$ krakenuniq-build --db DBDIR --threads 32 --work-on-disk Kraken build set to minimize RAM usage. Finding all library files Found 1 sequence files (.{fna,fa,ffn,fasta,fsa}) in the library directory. Creating k-mer set (step 1 of 6)... Using /ccb/sw/bin/jellyfish-install/bin/jellyfish Hash size not specified, using '2575692630' K-mer set created. [13m43.538s] Skipping step 2, no database reduction requested. Sorting k-mer set (step 3 of 6)... db_sort: Getting database into memory ...Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8]. Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8]. db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ... K-mer set sorted. [48m52.013s] Creating seqID to taxID map (step 4 of 6).. 705 sequences mapped to taxa. [0.059s] Creating taxDB (step 5 of 6)... Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 2426193 taxa taxDB construction finished. [1m4.789s] Building KrakenUniq LCA database (step 6 of 6)... Reading taxonomy index from taxDB. Done. Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8]. Reading sequence ID to taxonomy ID mapping ... got 705 mappings. Finished processing 705 sequences (skipping 0 empty sequences, and 0 sequences with no taxonomy mapping) Writing kmer counts to database.kdb.counts... LCA database created. [28m27.253s] Creating database summary report database.report.tsv ... /ccb/sw/bin/classify -d ././database.kdb -i ././database.idx -t 32 -r database.report.tsv -a ././taxDB -p 12 Database ././database.kdb Loaded database with 2505641687 keys with k of 31 [val_len 4, key_len 8]. Reading taxonomy index from ././taxDB. Done. 705 sequences (3298.43 Mbp) processed in 153.354s (0.3 Kseq/m, 1290.51 Mbp/m). 705 sequences classified (100.00%) 0 sequences unclassified (0.00%) Writing report file to database.report.tsv .. Reading genome sizes from ././database.kdb.counts ... done Setting values in the taxonomy tree ... done Printing classification report ... done Report finished in 0.006 seconds. Finishing up ...Database construction complete. [Total: 1h36m33.683s] You can delete all files but database.{kdb,idx} and taxDB now, if you want

Here are the contents of DBDIR:

@.** test_krakenuniq]$ ls DBDIR/ DBDIR/database0.kdb DBDIR/database.idx DBDIR/database.kdb DBDIR/database.kraken.tsv DBDIR/library-files.txt DBDIR/taxDB DBDIR/database-build.log DBDIR/database.jdb DBDIR/database.kdb.counts DBDIR/database.report.tsv DBDIR/seqid2taxid.map

DBDIR/library: vertebrate_mammalian

DBDIR/taxonomy: citations.dmp database-build.log delnodes.dmp division.dmp gc.prt gencode.dmp merged.dmp names.dmp nodes.dmp readme.txt taxdump.tar.gz

On Thu, Jun 23, 2022 at 9:16 AM Nick Youngblut @.***> wrote:

I tried krakenuniq-build --db . --threads 32 --work-on-disk in the appropriate directory, but I still got the same error.

Maybe it's due to how I'm adding genomes to the library? My simple helper script for that:

!/usr/bin/env python

from future import print_function import os import sys import re import argparse import logging

logging

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.DEBUG)

argparse

class CustomFormatter(argparse.ArgumentDefaultsHelpFormatter, argparse.RawDescriptionHelpFormatter): pass

desc = 'Adding genome to krakenuniq database' epi = """DESCRIPTION: Write output files to db_dir:

  • renamed genome fasta (all special characters removed from names)
  • krakenuniq map file """ parser = argparse.ArgumentParser(description=desc, epilog=epi, formatter_class=CustomFormatter) parser.add_argument('fasta_file', type=str, help='Input genome fasta file') parser.add_argument('taxid', type=str, help='Taxonomy ID for the genome') parser.add_argument('sample', type=str, help='Genome name') parser.add_argument('db_dir', type=str, help='Output database location (e.g., ku_db/library/)') parser.add_argument('--version', action='version', version='0.0.1')

def copy_genome(infile, outdir, sample): outfile = os.path.join(outdir, sample + '.fna') regex = re.compile(r'[^>A-Za-z0-9-\n]') gz = infile.endswith('.gz') contigs = list() with _open(infile) as inF, open(outfile, 'w') as outF: for line in inF: if gz: line = line.decode('utf-8')

seq header

        if line.startswith('>'):
            line = regex.sub('_', line)
            contigs.append(line.lstrip('>').rstrip())
        # writing to output directory
        outF.write(line)
logging.info(f'File written: {outfile}')
# return
return contigs

def write_map(contigs, outdir, sample, taxid): outfile = os.path.join(outdir, sample + '.map') with open(outfile, 'w') as outF: for contig in contigs: outF.write('\t'.join([contig, taxid, sample]) + '\n') logging.info(f'File written: {outfile}')

main interface function

def main(args): if not os.path.isdir(args.db_dir): os.makedirs(args.db_dir) contigs = copy_genome(args.fasta_file, args.db_dir, args.sample) write_map(contigs, args.db_dir, args.sample, args.taxid)

script main

if name == 'main': args = parser.parse_args() main(args)

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164396640, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHLAYYFAF6PS2C7TTYLVQRPUJANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

nick-youngblut commented 2 years ago

I tried creating a new krakenuniq library, and now I'm getting the following:

krakenuniq-build  --kmer-len 31  --build --threads 12           --db $DB
Kraken build set to minimize disk writes.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Creating k-mer set (step 1 of 6)...
Using /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish
Hash size not specified, using '32573424'
/tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish: error while loading shared libraries: libjellyfish-1.1.so.1: cannot open shared object file: No such file or directory

I installed krakenuniq v0.7.3 via:

git clone https://github.com/fbreitwieser/krakenuniq
cd krakenuniq
./install_krakenuniq /PATH/TO/INSTALL_DIR

...since that version isn't on bioconda yet

alekseyzimin commented 2 years ago

Did jellyfish compile and install properly? Can you check if /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/ krakenuniq/jellyfish-install/bin/jellyfish works? If you have jellyfish1 installed elsewhere, you can specify its path with the appropriate option to build.

On Thu, Jun 23, 2022 at 11:07 AM Nick Youngblut @.***> wrote:

I tried creating a new krakenuniq library, and now I'm getting the following:

krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB Kraken build set to minimize disk writes. Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Creating k-mer set (step 1 of 6)... Using /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish Hash size not specified, using '32573424' /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish: error while loading shared libraries: libjellyfish-1.1.so.1: cannot open shared object file: No such file or directory

I installed krakenuniq v0.7.3 via:

git clone https://github.com/fbreitwieser/krakenuniq cd krakenuniq ./install_krakenuniq /PATH/TO/INSTALL_DIR

...since that version isn't on bioconda yet

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164530869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHIE5LLWCQDRUG7RQB3VQR4THANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

alekseyzimin commented 2 years ago

There may be a problem with your environment. Simple:

export LD_LIBRARY_PATH=tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/ krakenuniq/jellyfish-install/lib/

should fix it, but in general it should not be necessary.

On Thu, Jun 23, 2022 at 11:17 AM Aleksey Zimin @.***> wrote:

Did jellyfish compile and install properly? Can you check if /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/ krakenuniq/jellyfish-install/bin/jellyfish works? If you have jellyfish1 installed elsewhere, you can specify its path with the appropriate option to build.

On Thu, Jun 23, 2022 at 11:07 AM Nick Youngblut @.***> wrote:

I tried creating a new krakenuniq library, and now I'm getting the following:

krakenuniq-build --kmer-len 31 --build --threads 12 --db $DB Kraken build set to minimize disk writes. Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Creating k-mer set (step 1 of 6)... Using /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish Hash size not specified, using '32573424' /tmp/global2/nyoungblut/code/dev/Struo2/bin/scripts/krakenuniq/jellyfish-install/bin/jellyfish: error while loading shared libraries: libjellyfish-1.1.so.1: cannot open shared object file: No such file or directory

I installed krakenuniq v0.7.3 via:

git clone https://github.com/fbreitwieser/krakenuniq cd krakenuniq ./install_krakenuniq /PATH/TO/INSTALL_DIR

...since that version isn't on bioconda yet

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164530869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHIE5LLWCQDRUG7RQB3VQR4THANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

nick-youngblut commented 2 years ago

Yeah, the path was just messed up.

The run worked:

Kraken build set to minimize RAM usage.
Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.
Skipping step 1, k-mer set already exists.
Skipping step 2, no database reduction requested.
Skipping step 3, k-mer set already sorted.
Skipping step 4, seqID to taxID map already complete.
Skipping step 5, taxDB exists.
Skipping step 6, LCAs already set.
Database construction complete. [Total: 0.014s]
You can delete all files but database.{kdb,idx} and taxDB now, if you want

...but I the set_lcas: unable to open database.idx: No such file or directory is generated if you try to re-build the database after building (or attempting to build) the database once

alekseyzimin commented 2 years ago

Thank you for reporting this bug -- it must have been there for a while. I fixed it, please go to your krakenuniq folder and git pull and reinstall.

On Thu, Jun 23, 2022 at 11:37 AM Nick Youngblut @.***> wrote:

Yeah, the path was just messed up.

The run worked:

Kraken build set to minimize RAM usage. Found 10 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory. Skipping step 1, k-mer set already exists. Skipping step 2, no database reduction requested. Skipping step 3, k-mer set already sorted. Skipping step 4, seqID to taxID map already complete. Skipping step 5, taxDB exists. Skipping step 6, LCAs already set. Database construction complete. [Total: 0.014s] You can delete all files but database.{kdb,idx} and taxDB now, if you want

...but I the set_lcas: unable to open database.idx: No such file or directory is generated if you try to re-build the database after building (or attempting to build) the database once

— Reply to this email directly, view it on GitHub https://github.com/fbreitwieser/krakenuniq/issues/97#issuecomment-1164567125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHIUBNXVXB7UPF6RQ3DVQSAB5ANCNFSM5YFNBP6Q . You are receiving this because you commented.Message ID: @.***>

-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com

nick-youngblut commented 2 years ago

Yep, that fixed the issue. Thanks @alekseyzimin for all of your help!