benjjneb / decontam

Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
https://benjjneb.github.io/decontam/
147 stars 25 forks source link

Missing data and different measures of concentration #12

Closed stangedal closed 6 years ago

stangedal commented 6 years ago

The decontam package looks great, but I am not sure I can use it due to some issues with my data. You see both qubit and picogreen were used for DNA concentrations in my samples. Would it be possible to get around this by dividing the data in batches with regards to this, and then combine the results when removing probable contaminants? The other problem is missing measurements. Is it possible to use decontam to analyse most, but not all samples in a dataset, and still get meaningful results?

Best wishes, Solveig Tangedal

benjjneb commented 6 years ago

Would it be possible to get around this by dividing the data in batches with regards to this, and then combine the results when removing probable contaminants?

Yes, I think that should work out fine actually. I would recommend using the batch argument in the isContaminant function to specify the DNA concentration method used. The method will then identify contaminants in each batch separately, and combine the results at the end (use the minimum P between batches by default).

Is it possible to use decontam to analyse most, but not all samples in a dataset, and still get meaningful results?

Yes, that's fine actually. decontam identifies contaminat taxa (or OTUS/ASVs/etc). Once identified, you can remove those taxa from other samples as well, such as samples in which the DNA quantitation data was missing and hence weren't included in the data used to identify contaminants.

stangedal commented 6 years ago

This made my day! Thank you so much - running decontam is now the first thing on my todo-list!

Have a great day!