bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
258 stars 68 forks source link

Fungal reads - Which is the best database? #278

Open andressamv opened 5 months ago

andressamv commented 5 months ago

Hi! I have been using Kaiju for a while, and now I am interested in filtering fungal reads. For this, I used the Kaiju app in KBase and compared the results using two different databases: NCBI BLAST nr+euk (protein sequences from nr: Bacteria, Archaea, Viruses, Fungi, and microbial eukaryotes) and fungi (protein sequences from a representative set of fungal genomes). Based on the same samples, I would expect to have more fungal reads using the comprehensive database (nr since I thought RefSeq would be included in nr), but the fungal one results in way more hits. Please, what is the explanation for that?

pmenzel commented 5 months ago

Hi! Not necessarily all genomes from RefSeq are contained in the BLAST nr database, so it might well be, that more reads get classified by the RefSeq fungi database.

You can manually check some of the reads that are classified by the RefSeq database and not by the nr database and use the NCBI BLAST website to see if they have good matches in nr..