benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
147 stars 25 forks source link

Paired samples running Decontam and how to extract ASV/map ASV to taxtable #61

Open Jasmine84 opened 5 years ago

Jasmine84 commented 5 years ago

Dear Benjamin

Thank you for developing this user-friendly R package for the challenge with contaminant filtration.

I have some questions about decontam below. I hope you have time for answering and helping me out with this.

Question 1: How would you run paired samples in Decontam. I have two samples from the same person.

Alternative 1: Merge samples from the same person --> two samples as one sample and then run decontam Problem: decontam is based on sample comparison - would it not be wrong merging two samples?

Alternative 2: Run one sample from each person separately --> two runs --> then merge the two decontam phyloseq objects. Problem: If the first decontam phyloseq object do classify ASV1 as contaminant but the other phyloseq object does not - I would have to look at the ASV1 and decide if I think it is logical that it is classified as a contaminant.

!Note I have three different control sampletypes. So I run decontam separately with first control, then use the phyloseqobject created to run the next control samples i.e. three steps control checks.

Question 2: I would like to look at my ASVs classified as contaminants and which taxa they represent. How can I do this in R?

Thanks for the help in advance!

Best regards, Jasmine

benjjneb commented 5 years ago

Q1: You don't want to merge your samples to run decontam. Just treat them as independent samples, for the purpose of identifying contaminants, which is at least as valid as the assumptions being made by the method anyway.

Q2: The default (detailed=TRUE) output of isContaminant will give you a data.frame with a $contaminant column indicating whether each ASV/OTU/taxa was called a contaminant or not. You can cross-reference that with the taxonomic assignments of the taxa you gave the function to inspect the taxonomies of identified contaminants.

Jasmine84 commented 5 years ago

Thank you so much for answering so qucikly!

Q2: I was hoping for a little help with the R-codes for matching the contaminants with tax_table in the phyloseq object. I know how to inspect/view which ASV are classified as contaminants - however the contaminant.prev01 do not contain information about matching taxa to the ASVs. The tax_table is in the phyloseqobject, and I have not yet figured out how I can match the tax_table in phyloseq object with the identified taxa in contamintant.prev01 based on the ASV number.

contamdf.prev01 <- isContaminant(mergedPS, method="prevalence", neg="is.neg", threshold=0.1, detailed = TRUE)

contamdf.prev01

image

Best regards, Jasmine

benjjneb commented 5 years ago

They share a common index, that is the rows of the data.table returned by isContaminant are in the same order as the taxa within the phyloseq object. So to get, for example, the genera corresponding to just the identified contaminants you can do:

tax_table(ps)[contamdf$contaminant,]
benjjneb commented 5 years ago

Or you can get a phyloseq object with just the contaminant taxa:

ps.contam <- prune_taxa(contamdf$contaminant, ps)
Jasmine84 commented 5 years ago

It works, thank you so much for the help and advice!