benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
146 stars 24 forks source link

Selection of methods for different sample types and non-subjective strategy to select method and threshold #139

Open cBundg opened 11 months ago

cBundg commented 11 months ago

Hi, First, I would like to commend the work of the author and coworkers for this excellent package - it makes the process much more controlled and we have happily been using it. However, in a current project I became in doubt on what strategy to use for decontamination.

A little background: We are investigating samples from catheter urine, midstream urine (MSU) and vaginal swabs in a bladder disorder compared to a healthy control group. Overall, we wish to compare the disorder to the controls individually for the sample types, but we are also interested in investigating how the three sample types (catheter vs MSU vs vaginal swabs) compare to each other. We have measured DNA concentration and have multiple negative controls for all samples.

For structure, I have tried to divide my inquiry into the following questions:

  1. How would you recommend selecting which method (frequency vs prevalence method) as well as cutof to use? I can use the presence of a few clear contaminants with high abundance in negative samples and low DNA concentrations (Rhodococcus and chloroplast) as indicators, but that would cause my selection to be highly influenced by ASVs belonging to these genera specifically. I would love to use a more subjective strategy. Do you have any recommendations?

  2. Since catheter, MSU and vaginal samples have different risks of contamination, I was wondering whether it would be feasible to use different decontamination approaches for the different sample types? Or will that introduce bias?

  3. How would you recommend handling samples with DNA concentrations below detection limit? For catheter samples, around 30 % of samples have DNA concentrations below detection limit (which is similar to negative controls), while a large fraction have DNA concentrations just above this. I am wondering how to handle this in regard to the frequency method - Can you set the DNA concentration of samples below detection limit to lod/2 or would you recommend to remove these samples?

benjjneb commented 11 months ago
  1. In our previous work we found using both together (method="combined") to be the most effective, if both types of auxiliary data is available.
  2. It is reasonable to identify contaminants in different sample types independently. But if future analysis is combining them together, then usually contamiannts identified in any sample type should be excluded in all.
  3. I don't have a data-driven answer. I think I would try dropping below LOD samples first.
cBundg commented 11 months ago

Thank you for the valuable suggestions. I'll be honest, I had not considered the combined method. I will try out the suggestions, and see how it influence results.

Best Caspar