Closed ghost closed 2 years ago
Hello, SILVA ACT and phyloFlash use different aligners: SINA and Bbmap respectively. In theory SINA should be more sensitive, but Bbmap would be much faster. 40% is pretty low, though, which seems to me like it might be an artefact and not a true rRNA sequence.
A few possibilities:
Hope this helps!
Hi! Thanks for the super quick answer!
So, replying to your points:
I checked and indeed there are (at least) 4 reference seqs that consistently come up as unclassified with SINA, but classified with bbmap CFFF01000203.4900.6512 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus pneumoniae CVOF01000044.509.1766 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Streptococcus pneumoniae CKQU01000078.7468.8814 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus pneumoniae AWPA01000003.163001.164539 Bacteria;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus pyogenes
the positional coverage histogram is skewed towards the first half of the sequence, but doesnt show one or two dramatic peaks
1 SSU sequence with streptococcus taxa assignment did assemble, blasting to "Streptococcus pyogenes strain MGAS2221 chromosome, complete genome",
Do you have any suggestions on how to best deal with this?
To check if I understand correctly, was it the unassembled reads from your data that you tried to classify with SILVA ACT, or the reference sequences that they mapped to? Or both?
If there is a full length sequence assembled with phyloFlash, and this has Blast hits to more than one reference genome sequence, I'd be inclined to say that it's probably not a mapping artefact. From what you said above, you were not expecting to find Streptococcus in the sample, is that right? Have you tried to assemble the metagenome, too?
SINA uses a last-common-ancestor method to classify sequences, based on the taxonomy of the best hits. If there is a misannotation of a reference sequence in the database, it's possible that the classification gets thrown off as a result. There is some curation and filtering of what goes into the SILVA database, but sometimes things fall through the cracks.
Hi all,
I am running as such:
phyloFlash.pl -read1 S_1.fastq.gz -read2 /S_2.fastq.gz -readlength 100 -lib S_def_6 -dbhome phyloflash/db/138.1 -taxlevel 6 -html -treemap -zip -log -poscov
I have noticed a large number of reads mapped that do not match my expectations for this sample, and I have used several read sequences in the SILVA ACT service to check their taxonomy - most frequently SILVA ACT returns an unclassified read with %ID at around 40-50%, whilst phyloFLASH keeps these reads with bbmap -id 70-90%.
Do you have any idea what is happening?
Thanks!