DeniRibicic / q2ONT

Bash pipeline for analysis of ONT full-length 16S sequences in QIIME2
27 stars 12 forks source link

frequency per feature after proceesing nanopore seq data in qiime2 #8

Open mzakram219 opened 1 year ago

mzakram219 commented 1 year ago

Dear, I am writing to get input from experienced ones. I got nanopore data and using q2ONT command line to process my 16srRNA gene seq data. After demuliplexing, adapters removal, and trimming the reads to 1400 length, i imported my sequencing data into qiime2. Ised these commands for deprelication of sequences, and for obtaining feature table seqs and feature table summary.

Dereplication of sequences qiime vsearch dereplicate-sequences --i-sequences 4_single-end-demux.qza --o-dereplicated-table 5_derep-table.qza --o-dereplicated-sequences 5_derep-seqs.qza

visualization files qiime feature-table tabulate-seqs --i-data 5_derep-seqs.qza --o-visualization 5_derep-seqs.qzv

qiime feature-table summarize --i-table 5_derep-table.qza --o-visualization 5_derep-table.qzv

After these steps, i got two files, derep-seqs.qzv and derep-table.qzv.

Upon checking derep-table.qzv using qiime2 view, i realized that something might have gone wrong, as Frequency per feature is showing 1. Photo is attached. Screenshot 2023-11-04 230458

Could you please provide insights what could have gone wrong that i obtained such outcomes, or it is normal to get such outcomes while processing nanopore seq data? Thank you

DeniRibicic commented 1 year ago

You have your output, meaning nothing is wrong there.

If you read carefully the pipeline description, you will learn that the vsearch is clustering OTUs based on 85% similarity, which is sort of a threshold that has been used in this study. That being said, ONT has a fairly high error rate, depending on chemistry up to 20%. In this case this would mean that 85% threshold would be too stringent here, and yielding single frequency per feature. One thing you can do is to try to lower a bit the threshold in the source code and explore to what extent does it alter the output. But don't get too crazy here, looser threshold might simply cluster sequences from biologically different species into same OTU- this is something you definitely want to avoid. In my opinion, ONT with such high base calling error rates are still not suitable for OTU clustering, less so denoising, the best thing would be to dereplicate them at 100% and treat each sequence as separate OTU. In your case you have that output already, albeit using default 85% threshold, Just look into taxonomy, and based on that group your reads for potential downstream statistics.

I would also advise you to explore different pipelines for analysis of 16S rRNA generated by ONT. q2ONT is a fairly old pipeline, and it is not updated anymore.

mzakram219 commented 1 year ago

Dear, Thank you for providing the help with up mentioned issue. I solved the problem and successfully performed the taxonomic assignment. Later, I exported all the required files to make phyloseq object in R. I now have couple of questions regarding downstream analysis. Could you please provide some help!

1 - Do you think data should be rarefied before analyzing alpha and beta diversity? 2 - What do you think about B-diversity? I observed that B-diversity based on bray-curtis disimilarity did not give satisfactory results for Nanopore 16s data. Is it normal with Nanopore 16s data? 3 - Should i more focus on other weighted and unweighted unifrac diversities?

Looking forward to having answer based on your expertise. Thank you! Regards Muhammad