DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
683 stars 266 forks source link

kraken2 with plusPFP and plant sample #785

Open vebaev opened 6 months ago

vebaev commented 6 months ago

Dear all,

i have an rna-seq (PE, ss, rRNA removed) from a moss species (no genome available). I run Kraken2 (default options) with plusPFP DB (containing some moss genomes), and what confuses me that it discovers only bacterial and some fungi reads (30%), and 70% unclassified. I expected some percent to be classified to plant but strangely not the case? Am i doing something wrong?

jenniferlu717 commented 5 months ago

Did you download this database from the aws site? The databases provided only use complete genomes so it could be that there is not a genome with close enough relevance to your moss species to allow it to show up.

vebaev commented 5 months ago

I'm using kraken2 and the PlusPFP via galaxy EU. I thought the same, but than tried with a dataset from a papper and result was the same, only bacteria, fungi and viruses, and not a single read to plant.....strange!

jenniferlu717 commented 3 months ago

Then it may be a result of how the sample was extracted and processed for sequencing

jamesboot commented 3 months ago

Hi, to add to this, we recently used the kraken2 PlusPFP-16 reference downloaded from: https://benlangmead.github.io/aws-indexes/k2 - we ran kraken with default parameters and found no reads classified to plant genomes. We are very confident that our samples contain plant genome reads, therefore we took 10 reads at random from our raw data, performed a BLASTn against genomes and found that all 10 randomly sampled reads aligned to plant genomes. We're not using Galaxy - but there appears to be an issue on there regarding a mix up with the databases not containing plant - could there be an issue with the pre-compiled indexes?

vebaev commented 3 months ago

Yes on Galaxy (all servers) there is an issue where plusPFP is same as PlusPF so plants are missing. So they are currently investigating from where is that issue: https://github.com/galaxyproject/idc/issues/37

permia commented 1 month ago

I run kraken2 for one RNA-seq data of some fungi, maily Colletotrichum spp., against PlusPFP (k2_pluspfp_20240112.tar.gz). Nearly half reads are unclassified! It's worst. I think the refseq genome are too small to do such classifing job, especialy there are less genome data about your studied organism.

vebaev commented 1 month ago

Pity, that the PlusPFP issue still persist and it is not addressed half year so far...