Taxonomic assignment - 50%+ unassigned reads

mzakram219 commented 1 year ago

Dear I followed the MetONTIIME workflow to process my Nanopore sequencing data. I used guppy for demultiplexing, adapter, primer and barcode removal. Later I used Nanofilt for filtering and quality control, and import to the Qiime (version 2023.9) using Casava files. I did dereplication and then Denovo Clustering by following the criteria mentioned in MetONTIIME. I assigned the Taxonomy using Vsearch and later checked the results through taxonomy.qzv and bar plots. Majority of the reads were recalled as unassigned, around 50-60%.

Later, I made some modifications. I used Trimmomatic, and Crop the length UpTo 1400. I imported on Qiime, followed the dereplication step and then investigate the Chimeras. I removed the Chimeras before Clustering, and afterwards I did OTU open clustering with 85% identity. I used the Greengene22 for taxonomy assignment and again 50-60% reads were assigned as unclassified.

Could you please suggest what modifications should I do to improve the taxonomic assignment of my reads?

Bonus question: If I am able to rectify the problems and then Am I able to export the artifacts that are necessary to make Phyloseq object in R? As I am looking for diversity analysis and then Differential abundance analysis through LEfSE.

I would be very happy to provide files or more information, if needed.

Thank you! Regards Muhammad Zeeshan AKRAM

MaestSi commented 1 year ago

Hi, I am not going to provide direct assistance with your files, as you are not running the pipeline (but something resembling it, step by step). I would suggest you to check a couple of things. First, what is the alignment identity threshold that you are using? The classification step should produce a search_results.qza file, which you can unzip, and use to plot the distribution of alignment identities for all hits. If you can clearly see that there is a tail of the distribution which is cut by your threshold, you may be required to use a less stringent identity threshold. Second, I would suggest to use MetaBlast pipeline using NCBI nt db as a reference database, in order to check if you have some non-specific amplicons, e.g. the target gene may be amplified in the host as well if you are studying animal microbiota. Third, your sample may contain some bacteria that are not available in the database you are using. Let me know if you are able to find out the answer! Best, SM

mzakram219 commented 1 year ago

Thank you, decreasing the identity threshold worked for me. Regards Muhammad

MaestSi commented 1 year ago

Great! SM

MaestSi / MetONTIIME

Taxonomic assignment - 50%+ unassigned reads #75