Open ghost opened 5 years ago
Could you also show the histogram of the scores assigned by isContaminant
? You can do that with something like hist(foo$score, n=100)
if foo
was what you assigned the output of isConatminant
to. That will probably be more informative about the distribution of potential contaminants.
- Is it a problem that I only have four control samples, but 84 real samples?
You are underpowered to detect contaminants with only 4 control samples. That is probably why you aren't detecting any at the default threshold of 0.1.
- Is it inappropriate to use my normalised sample counts? Do I have to use abundance values?
No. By default isContaminant
will normalize to proportions anyway.
- Does this mean that the same species occur in negative and biological samples and appear to be evenly distributed? May this be an issue of cross-contamination? However, just for fun I manually added a quite exotic organism (not found in any of the biological samples) to my negative controls (10000 normalised read counts) and I added the same exotic organism to three patient samples with 100 normalised reads, but the outcome remained unchanged. Why?
Probably because the contaminant threshold of 0.1 is underpowered given the small number of control samples (see above). What is the $score
assigned to that exotic organism? It should be below 0.5, but apparently not quite below 0.1.
Thank you so much for the fast reply!
No. By default isContaminant will normalise to proportions anyway.
I accidentally inserted my presence/absence species table, instead of the normalised counts. This explains why the effect of the manually added exotic organism was so low. I repeated the procedure and this time used my abundance table. I still get only 1905 false and no true contaminats but now it looks more like this:
Could you also show the histogram of the scores assigned by isContaminant? You can do that with something like hist(foo$score, n=100) if foo was what you assigned the output of isConatminant to. That will probably be more informative about the distribution of potential contaminants.
The output of isContaminant does not contain a column called "score":
But if I plot the frequency:
What is the $score assigned to that exotic organism? It should be below 0.5, but apparently not quite below 0.1.
So you are probably right and four negative controls are just not enough to identify contamination. Do you have any other suggestions on what I could do to account for the negative controls instead?
Thanks for all your efforts!
Best wishes, Marie
That new plot looks very off, you shouldn't be getting prevalence greater than the number of samples you have.
The "score" is in the $p$
column. What does summary(foo$p)
and hist(foo, 100)
give as results?
Can you also post the exact command you are exectuing, i.e. foo <- isContaminant(...)
? Because your results look kind of strange.
I have the similar problem as that . I have just one control , and every time ,I use one sample and one control and with prevalence method to identify contamination.I always get a result that all species is not contamination ,but I have known that almost of that is and the result is wrong . Maybe one control is not allowed in decontam?
Maybe one control is not allowed in decontam?
For the prevalence method, having only one control sample will result in no valid contaminant assignments.
This is as designed, and as appropriate. There is not enough information in a single control sample to confidently describe contaminant taxa.
Hello,
this is a very nice package! Thank you.
I sequenced 84 biological samples (cotton swabs) obtained from a low biomass environment and 4 negative controls (blank swabs, no DNA) by using a shotgun metagenomics approach. Blank swabs accompanied biologial samples during sample storage, DNA extraction, library preparation and the sequencing run. Quality filtering was performed, and eukaryotic reads were identified and discarded. Reference-based assembly was chosen to identify bacterial reads.
1905 bacterial hits were observed. The outcome table has reads normalised to genome size and sequencing depth. I wanted to test the decontam package in R to identify contamination based on my blank swabs controls and remove contamination from patient samples by the prevalance method.
However, when I run the package. The outcome of table(contamdf.prev$contaminant) was the following: 1905 "FALSE" and 0 "TRUE" hits.
No matter which threshold I used within function "isContaminant()". The graphical outcome looked like this:
My questions are:
Does this mean that the same species occur in negative and biological samples and appear to be evenly distributed? May this be an issue of cross-contamination? However, just for fun I manually added a quite exotic organism (not found in any of the biological samples) to my negative controls (10000 normalised read counts) and I added the same exotic organism to three patient samples with 100 normalised reads, but the outcome remained unchanged. Why?
Is it inappropriate to use my normalised sample counts? Do I have to use abundance values?
Is it a problem that I only have four control samples, but 84 real samples?
Thank you very much in advance!
Marie