Amir-61 / appointmentSystem

0 stars 0 forks source link

illumina sequencing reads with high quality scores have low percentage mapping to refseq and classification #209

Closed pomidorku closed 5 years ago

pomidorku commented 5 years ago

Hello all,

I am assembling my first illumina genome. It is a bacterial genome about 2.5 MB in size. genus Ignatzschineria.

FastQC shows the following stats for the R1 and R2 reads:

File type-----------------------------Conventional base calls Encoding----------------------------Sanger / Illumina 1.9 Total Sequences---------------------16770385 Sequences flagged as poor quality---0 Sequence length---------------------124 %GC---------------------------------41

I used several other tools to evaluate the raw genomes before the assembly.

Bowtie2 maps 11.86% (~2 million) of the total reads to the reference genome Ignatzschineria larvae Minimap2 maps 19.12% (3.2 million) of the total reads to the reference genome Ignatzschineria larvae

Kraken2 (default options) is able to classify only 42.9% (7187522) of those 16.7 million reads as follows: 42.6% bacterial and 0.0239 viral. Only 5.78 million reads classify to Ignatzschineria larvae.

This is a very LOW classification/mapping percentage to a sister species (assumimg that we indeed sequenced an Ignatzschineria sp). Kraken2 reports 57.1% of the 16770385 reads are unclassified. I wonder to what living entity those 57% unclassified reads belong? I used Kraken2 databases that include archaea, bacteria, viruses, plasmids, humans, parasites of invertebrates. I did not try plants. The sample was collected from a field experiment on blowflies and the bacteria were cultured in the lab. The target colonies were isolated from those cultures.

I used Metaxa2.22 to check the 16sRNA contained in the reads, and the results of this survey are as follows:

Out of 54584 hits: 14059 ----------unclassified Gammaproteobacteria 3613 ----------Igntazschineria 33173 ----------unclassified Xanthomonadaceae Of course, there are other minor hits.

So, I checked the kraken2 database and I was able to verify that all published genomes of Xanthomonadaceae are present.

Does any of you have a have any suggestion what tool/database would be appropriate for finding out what are the entities that are not classified by Kraken2 and reported as "unclassified Xanthomonadaceae" by Mataxa?

Any help will be appreciated.

Regards,

Cecilio1

pomidorku commented 5 years ago

Sir, Why did you close my issue? How can I post this issue that will be accessible to anyone who could help?

Amir-61 commented 5 years ago

Did you plan to create an issue against: https://github.com/Amir-61/appointmentSystem ? It looks like you opened your issue against a wrong repo. Im not understanding your issue

pomidorku commented 5 years ago

I see. Your own this repository. Sorry I am new to this. I understand now that I have to open my own repository to post my issue. Thank you