Shenhav-and-Korem-labs / SCRuB

Other
25 stars 2 forks source link

What contaminants? #2

Closed LMalard closed 1 year ago

LMalard commented 1 year ago

Thanks for the tool. I managed to run it but I am wondering: Is possible to know the number of contaminants identified by SCRuB ? Can we get the list of contaminants to then get the taxonomy associated ?

Thanks

gaustin15 commented 1 year ago

Hi @LMalard, thanks for the post!

To observe the contaminants identified by SCRuB, you can look at the fitted parameters in the object output by SCRuB. One example is described one section (3.4) of this page https://shenhav-and-korem-labs.github.io/SCRuB/R-tutorial.html, although in your case the control blank DNA extraction name would be replaced by the identifier from your metadata.

This gamma component is the relative abundance of the contamination source that SCRuB identified, so any taxa with a large value in this vector are (at least in part) contaminants. If we are interested in obtaining a list describing the taxa the make up the contamination source, we can run something like the following (where scr_out is the output of the SCRuB function):

threshold <- 0.01
colnames( scr_out$decontaminated_samples)[ 
                  which( scr_out$inner_iterations$`control blank DNA extraction`$gamma > threshold ) ]

This uses a threshold of 1% to determine the presence in the contamination source. Hope this helps!