benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
148 stars 25 forks source link

Removing taxa on a per batch basis #10

Closed LewisNU closed 6 years ago

LewisNU commented 6 years ago

Hi Ben,

First off I'd just like to thank you and your co-developers for producing the decontam package. The concept is great, simple but effective! And as I'm in the final year of my PhD working in a low biomass world where contamination is a complete headache, your timing could not be better!

The issue I'm having is related to the batch option when using the prevalence method. I'm working with a phyloseq object of 16S data split into 3 batches. Each batch was extracted individually so I have 3 kit negative controls.

However the batch feature appears to be identifying contaminant taxa in one batch which are true features of another, most likely due to cross talk, and as such the entire removal of these taxa from further analyses isn't feasible.

Would there be a way to implement changing the contaminant removal method to a per batch method as oppose to the current blanket taxa removal? I'd thought for example perhaps if possible a function which sets the frequency of each taxa to 0 in contaminated batches?

If not any suggestions of a workaround would be greatly appreciated!

Best wishes, Lewis

benjjneb commented 6 years ago

There is a workaround here, and that is to just do the contaminant identification and removal on a per-batch basis by hand.

That is, split your phyloseq object into ps1, ps2 and ps3 (for each batch), run isContaminant on each, and zero out the taxa identified as contaminants in each batch. Then merge them back together.

I think it should be fairly straightforward (see ?subset_samples and ?merge_phyloseq) but am happy to help further.