bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
383 stars 188 forks source link

about kraken database and “classify error” when running kraken #327

Open 25280841 opened 3 years ago

25280841 commented 3 years ago

hello:

there was no error whill build kraken2 standard database, and the final size totaled 47GB, is that right size?? total 47G

-rw-r--r-- 1 root root  47G Nov 16 07:45 hash.k2d
-rw-r--r-- 1 root root   56 Nov 16 07:45 opts.k2d
-rw-r--r-- 1 root root 2.4M Nov 16 04:59 taxo.k2d

and there was an error when keraken run, how can I fix it??

metawrap kraken -o /hbcnfs3/kraken/MN2P2-2 -t 170 /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_1.fastq /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_2.fastq /hbcmgt/shangzhuang2020/assembly/MN2P2-2/final_assembly.fasta

########################################################################################################################
#####                                          RUNNING KRAKEN ON ALL FILES                                         #####
########################################################################################################################

Warning: /hbcnfs3/kraken/MN2P2-2 already exists.

------------------------------------------------------------------------------------------------------------------------
-----                        Now processing /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_1.fastq and                        -----
-----                         /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_2.fastq with 170 threads                         -----
------------------------------------------------------------------------------------------------------------------------

**/nfs/sopt/anaconda3/envs/metawrap/libexec/classify: invalid option -- 'd'**  
Usage: classify [options] <fasta/fastq file(s)>

Options: (*mandatory)
* -H filename      Kraken 2 index filename
* -t filename      Kraken 2 taxonomy filename
* -o filename      Kraken 2 options filename
  -q               Quick mode
  -M               Use memory mapping to access hash & taxonomy
  -T NUM           Confidence score threshold (def. 0)
  -p NUM           Number of threads (def. 1)
  -Q NUM           Minimum quality score (FASTQ only, def. 0)
  -P               Process pairs of reads
  -S               Process pairs with mates in same file
  -R filename      Print report to filename
  -m               In comb. w/ -R, use mpa-style report
  -z               In comb. w/ -R, report taxa w/ 0 count
  -n               Print scientific name instead of taxid in Kraken output
  -C filename      Filename/format to have classified sequences
  -U filename      Filename/format to have unclassified sequences
  -O filename      Output file for normal Kraken output

************************************************************************************************************************
*****            Something went wrong with running kraken on /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_1.fastq           *****
*****                         and /hbcnfs3/cleanreads/MN2P2-2/MN2P2-2_2.fastq . Exiting...                         *****
************************************************************************************************************************

real    0m0.188s
user    0m0.092s
sys 0m0.051s
Command exited with non-zero status 1
0.11user 0.06system 0:00.22elapsed 78%CPU (0avgtext+0avgdata 7292maxresident)k
0inputs+56outputs (0major+8290minor)pagefaults 0swaps
ursky commented 3 years ago

It looks like you have kraken2 installed, but are running the keaken1 module. Make sure you have metawrap v1.3.2, and use the kraken2 module.

25280841 commented 3 years ago

thanks,I used kraken1 instead of kraken2, and kraken2 works well,I used metawrap v1.3.2。

one more question, I run the kraken with or without the parameter “ -s 10000000” as follows: time metawrap kraken2 -o /hbcnfs3/kraken2/$i -t 170 -s 10000000 /hbcnfs3/cleanreads/$i/*.fastq /hbcmgt/shangzhuang2020/assembly/$i/final_assembly.fasta >/hbcnfs3/kraken2/$i.log 2>&1 &

time metawrap kraken2 -o /hbcnfs3/kraken2/$i -t 170 /hbcnfs3/cleanreads/$i/*.fastq /hbcmgt/shangzhuang2020/assembly/$i/final_assembly.fasta >/hbcnfs3/kraken2/$i.log 2>&1 &

and I got different size of output files 25GB and 112GB。 without “-s”, I got “[ WARNING ] Too many query IDs to store in chart; storing supplemental files in '/hbcnfs3/kraken/MN2P2-1/kronagram.html.files'.”

So, is it better to use the latter one ?

ursky commented 3 years ago

Both are fine, but not subsampling is always better if you can.

25280841 commented 3 years ago

ok, thanks a lot.