benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
148 stars 25 forks source link

How is presence/absence determined? #29

Closed zefrieira closed 6 years ago

zefrieira commented 6 years ago

In the prevalence method, how does decontam determines if a given ASV is present or absent in a sample? Is the cut-off relative to the sample size (eg.: present if relative abundance >= 0.01%) or absolute (eg.: present if abundance >= 1)?

benjjneb commented 6 years ago

Absolute.

But you can use any presence/absence approach you want if you just assign by hand. For example, create a new sequence table w/ entires of 1 if relative abundance >= 0.01% or 0 otherwise, and then give that table to isContaminant.

zefrieira commented 6 years ago

I see. Thank you for your prompt response!

Do you have any recommendations? My samples have a highly variable sample depth, that's why I'm interested in using relative abundances to determine presence/absence.

benjjneb commented 6 years ago

What we have tested is using the data as is, i.e. without controlling the variable sampling depth first. However, I can't make a strong recommendation, because we haven't tested alternative strategies on that front.

Perhaps you could try two approaches, one using the raw data and one thresholding presence/absence at some value, and compare? If you do we'd love to hear if you see (or don't) a difference.