Most of my ITS ASV's map to Other (>60%)

benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution

http://benjjneb.github.io/dada2/

GNU Lesser General Public License v3.0

460 stars 141 forks source link

Most of my ITS ASV's map to Other (>60%) #1768

Closed hereformore closed 1 month ago

hereformore commented 1 year ago

I did ITS sequencing of stool samples for Fungi. However when I look at relative abundances, even for phylum 60-70% of my relative abundance graph is Other DNA.

When I look at how my ASVs map with taxa from Unite Database, the majority of taxa do not map past Kingdom (only 27% of Phylum maps to a reference sequence, besides Kingdom:Fungi).

Is this normal? And is there anything I can do to filter the data so it's not just 90% other?

This is what was done already by a collaborator (Dada 1.16)

Amplicon sequence variants (ASVs): paired FASTQ reads were trimmed, and then filtered to remove reads containing Ns, or with maximum expected errors >=2.
Samples with fewer than 1,000 sequences were discarded. ASVs accounting for less than one millionth of all strain-level markers were discarded

benjjneb commented 1 year ago

Try assignTaxonomy(..., tryRC=TRUE) to see if the cause might be variable ordination of your reads (this option will cause the reverse-complement of each sequence to be compared against the reference database as well).

If that doesn't change things, I would start BLAST-ing some of your more abundant non-classified ASVs against a broad database like nt. What are they? Off-target amplification is another possibility that should show up with this approach.

hereformore commented 1 year ago

thank you so much.

I set tryRC=TRUE originally and came up with the same result.

I also looked up the top non-classifed ASVs and they tracked to "uncultured fungi", also results for plant DNA (solanium = tomato, another blast was for spinach).

fungiexport.xlsx

If it is off target amplification or even BLASTIng does not yield a specific genus/phylum, is there any acceptable option to filtering these ASVs? Or do i leave them as is and deal with the enormous amount of other.

If I filter out the ASVs that do not match any Phylum, i go from 700+ ASVs to 200 unique ASVs. this is my first time working with fungi so i am not as familiar with what is normal.

Edit: Also thank you so much for your prompt response and help. I am just a graduate student and learning, so to hear from the creator is really something.

benjjneb commented 1 year ago

Interpreting results from BLAST against nt can be a bit complicated, because not everything is correctly labelled. Are you sequencing tomato-related samples?

If so, the fact that tomato is showing up in your non-classifying ASVs (even if its not the only hit) would be highly suggestive of substantial off-target amplification.

If my presumptions so far are true, then there are at least two possible paths forward. One: Screen all ASVs against the tomato genome and remove anything that hits (this would be done outside DADA2, see previous work but this is very commonly done in human microbiome studies), and/or Two: Subset your analysis down to only those ASVs that classify as some kind of fungi.

On that second point -- this is a valid approach. Every sequencing-based method is already subsetting down to some fraction of what is really there. As long as your are clear and open about how you do further subsetting, it is valid.

hereformore commented 1 year ago

Thank you!

These are stool samples from human patients. Just the first highest uncalssified ASV was tomato related, the second highest un-classified ASV was Spinach, then Sunflower, then Spinach, then a bacteria. But to be clear the first result on Blast was "uncultured fungus" then the next 5+ would be the tomato/spinach/sunflower species as shown below.

i can definitely go through each of the highest count unclassified ASVs that match an uncultured fungi by kingdom, but no phylum. And on blast match uncultured fungus + Tomato or Spinach etc and remove them. Is that what you mean by the second point?

gzahn commented 11 months ago

https://www.tandfonline.com/doi/abs/10.1080/00275514.2023.2206931?journalCode=umyc20

We found that the standard fungal ITS primers co-amplify lots of eukaryotes (~40% of the "fungi" in the studies we examined weren't actually fungi!), and you really do just have to live with that and remove them downstream. If you used the UNITE_Fungi database to assign taxonomy, it would lead to a lot of "unknown fungus" assignments that were actually tomato or whatever. Using the UNITE_All database will help place these into the correct kingdom and then you can remove them downstream.

LukeLikesDirt commented 10 months ago

I would suggest extracting the ITS region using ITSxpress prior to quality filtering and denoising in DADA2. I also suggest doing this to the UNITE database using ITSx prior to the taxonomic assignment (example here). In my experience this increases the number of OTUs assigned to the level genus: (1) because extracting the ITS region in your reads removes a short fragment from either 18S, 5.8S or 28S which has no species resolution and random errors in this region could complicate taxonomic assignment, and (2) because extracting the ITS region from UNITE will increase sequence coverage between representative and reference sequence which should improve taxonomic assignment estimations.

I recommend using the UNITE all eukaryotes reference dataset and extract the ITS using '-t All' in ITSx, and also use the options '--region ALL' on your data in ITSxpress. This will allow you to gauge how many of your sequences belong to non-fungal eukaryotes.