Open mmcoff opened 1 year ago
If you check the ncbi taxonomic entry for Dissulfurimicrobium there is no entry for Proteobacteria. There is the level of clade "delta/epsilon subdivisions" not reported by kraken2.
What you could do (without changing the code) is to download this file from NCBI : https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ It contains a file named fullnamelineage.dmp and than perform a merge using the taxonomic ids (eg. 9606) in both files. However, this won't give you the rank of the taxonomic level.
rankedlineage.dmp would give you the rank, but also doesn't contain certain taxonomic levels, like clade "delta/epsilon subdivisions"
I am using the following code to run kraken2 with the standard 67GB database:
kraken2 --db /data/coffmanm/tools/krakenBig --threads 10 --confidence 0.05 --output krakenOut/${base}.output.txt --report krakenOut/${base}.report.txt --use-mpa-style --report-zero-counts --gzip-compressed --use-names --paired ${base}_R1_001.trimmed.fastq.gz ${base}_R2_001.trimmed.fastq.gz
In the report output, some of the taxa names are incomplete (i.e., dBacteria|cDeltaproteobacteria|gDissulfurimicrobium would ideally contain |pProteobacteria). Is there a way to edit the code so that the full taxa name is displayed in the report?