FOI-Bioinformatics / flextaxd

FlexTaxD (Flexible Taxonomy Databases) - Create, add, merge different taxonomy sources (QIIME, GTDB, NCBI and more) and create metagenomic databases (kraken2, ganon and more )
GNU General Public License v3.0
64 stars 7 forks source link

Krakenuniq "Unknown options: skip-maps" #50

Closed MortenEneberg closed 2 years ago

MortenEneberg commented 2 years ago

Dear David,

I encounter an error trying to build a krakenuniq database. Is it an error that you have encountered before?

flextaxd-create -db databases/NCBI_GTDB_merge.db -o taxonomy_krakenuniq --genomes_path "/shared-nfs/MEN/silico_reads/gtdb_202/krakenuniq_gtdb_202_no_dust/library/" -p 30 --verbose --logs build_kraken_logs --create_db --db_name krakenuniqdb_test --dbprogram krakenuniq --test
2021-12-03 09:51:24,361 create_databases [INFO ]  FlexTaxD-create logging initiated!
2021-12-03 09:51:24,366 create_databases [INFO ]  Processing files; create kraken seq.map
2021-12-03 09:51:24,367 DatabaseConnection [INFO ]  databases/NCBI_GTDB_merge.db opened successfully.
2021-12-03 09:51:24,693 ProcessDirectory [INFO ]  Number of genomes annotated in database 265625
2021-12-03 09:51:24,693 ProcessDirectory [INFO ]  Process genome path (/shared-nfs/MEN/silico_reads/gtdb_202/krakenuniq_gtdb_202_no_dust/library/)
2021-12-03 09:51:25,310 ProcessDirectory [INFO ]  Processed 57311 genomes
2021-12-03 09:51:25,539 create_databases [INFO ]  Genome annotations with no matching source: 220070
2021-12-03 09:51:25,798 create_databases [INFO ]  Loading module: CreateKrakenDatabase
2021-12-03 09:51:25,812 create_databases [INFO ]  Get genomes from input directory!
2021-12-03 09:51:25,812 DatabaseConnection [INFO ]  databases/NCBI_GTDB_merge.db opened successfully.
2021-12-03 09:51:26,133 CreateKrakenDatabase [INFO ]  krakenuniqdb_test
2021-12-03 09:51:26,192 create_databases [INFO ]  --- process finished in 0 minutes 1.8346500396728516 seconds---

2021-12-03 09:51:26,192 CreateKrakenDatabase [INFO ]  Test use only 10 genomes
2021-12-03 09:51:26,195 CreateKrakenDatabase [INFO ]  Create library directory
2021-12-03 09:51:26,199 CreateKrakenDatabase [INFO ]  Processing files; create kraken seq.map
2021-12-03 09:51:27,360 CreateKrakenDatabase [INFO ]  Number of genomes succesfully added to the krakenuniq database: 10
2021-12-03 09:51:27,360 create_databases [INFO ]  Genome folder preprocessing completed!
2021-12-03 09:51:27,360 create_databases [INFO ]  --- process finished in 0 minutes 3.002981424331665 seconds---

2021-12-03 09:51:27,360 create_databases [INFO ]  Create database
2021-12-03 09:51:27,360 CreateKrakenDatabase [INFO ]  mkdir -p krakenuniqdb_test/taxonomy
2021-12-03 09:51:27,397 CreateKrakenDatabase [INFO ]  cp taxonomy_krakenuniq/*.dmp krakenuniqdb_test/taxonomy
2021-12-03 09:51:27,796 CreateKrakenDatabase [INFO ]  cp taxonomy_krakenuniq/*.map krakenuniqdb_test
2021-12-03 09:51:27,796 CreateKrakenDatabase [INFO ]  krakenuniq-build --build --db krakenuniqdb_test  --threads 30
Unknown option: skip-maps
Usage: krakenuniq-build [task option] [options]

Task options (exactly one can be selected -- default is build):
  --download-taxonomy        Download NCBI taxonomic information
  --download-library TYPE    Download partial library (TYPE = one of "refseq/bacteria", "refseq/archaea", "refseq/viral").
                             Use krakenuniq-download for more options.
  --add-to-library FILE      Add FILE to library
  --build                    Create DB from library (requires taxonomy d/l'ed and at
                             least one file in library)
  --rebuild                  Create DB from library like --build, but remove
                             existing non-library/taxonomy files before build
  --clean                    Remove unneeded files from a built database
  --shrink NEW_CT            Shrink an existing DB to have only NEW_CT k-mers
  --standard                 Download and create default database, which contains complete genomes
                             for archaea, bacteria and viruses from RefSeq, as well as viral strains
                             from NCBI. Specify --taxids-for-genomes and --taxids-for-sequences
                             separately, if desired.

  --help                     Print this message
  --version                  Print version information

Options:
  --db DBDIR                 Kraken DB directory (mandatory except for --help/--version)
  --threads #                Number of threads (def: 1)
  --new-db NAME              New Kraken DB name (shrink task only; mandatory
                             for shrink task)
  --kmer-len NUM             K-mer length in bp (build/shrink tasks only;
                             def: 31)
  --minimizer-len NUM        Minimizer length in bp (build/shrink tasks only;
                             def: 15)
  --jellyfish-hash-size STR  Pass a specific hash size argument to jellyfish
                             when building database (build task only)
  --jellyfish-bin STR        Use STR as Jellyfish 1 binary.
  --max-db-size SIZE         Shrink the DB before full build, making sure
                             database and index together use <= SIZE gigabytes
                             (build task only)
  --shrink-block-offset NUM  When shrinking, select the k-mer that is NUM
                             positions from the end of a block of k-mers
                             (default: 1)
  --work-on-disk             Perform most operations on disk rather than in
                             RAM (will slow down build in most cases)
  --taxids-for-genomes       Add taxonomy IDs (starting with 1 billion) for genomes.
                             Only works with 3-column seqid2taxid map with third
                             column being the name
  --taxids-for-sequences     Add taxonomy IDs for sequences, starting with 1 billion.
                             Can be useful to resolve classifications with multiple genomes
                             for one taxonomy ID.
  --min-contig-size NUM      Minimum contig size for inclusion in database.
                             Use with draft genomes to reduce contamination, e.g. with values between 1000 and 10000.
  --library-dir DIR          Use DIR for reference sequences instead of DBDIR/library.
  --taxonomy-dir DIR         Use DIR for taxonomy instead of DBDIR/taxonomy.

Experimental:
  --uid-database             Build a UID database (default no)
  --lca-database             Build a LCA database (default yes)
  --no-lca-database          Do not build a LCA database
  --lca-order DIR1           Impose a hierarchical order for setting LCAs.
  --lca-order DIR2           The directories must be specified relative to the libary directory
  ...                        (DBDIR/library). When setting the LCAs, k-mers from sequences in
                             DIR1 will be set first, and only unset k-mers will be set from
                             DIR2, etc, and final from the whole library.
                                                         Use this option when including low-confidence draft genomes,
                             e.g use --lca-order Complete_Genome --lca-order Chromosome to
                             prioritize more complete assemblies.
                             Keep in mind that this option takes considerably longer.
Incomplete database, clean aborted.
2021-12-03 09:51:28,228 CreateKrakenDatabase [INFO ]  krakenuniq database created
2021-12-03 09:51:28,228 create_databases [INFO ]  --- Time summary  0 minutes 3.871030569076538 seconds---
davidsundell commented 2 years ago

Dear Morten,

It was some time since I tried building a krakenunique database, I will go through the pipeline and get back to you.

what version of krakenuniq do you have?

/David

MortenEneberg commented 2 years ago

Dear David, I have version 0.6 of krakenuniq installed (should be the newest, it was recently installed). I reopened another issue relating to kraken2 databases - dont know if that is the correct way to do it? Kind regards, Morten