Shenhav-and-Korem-labs / SCRuB

Other
25 stars 2 forks source link

Questions on counts, contigs, and RPM #13

Closed 1023011930 closed 1 year ago

1023011930 commented 1 year ago

Dear Sir or Madam, I appreciate your significant contribution in decontaminating the sequencing data. I haven't used SCRuB yet, but I think most of the software were researched based on ASV/OTU relative abundance tables as input, such as "decontam"

But in yout example "plasma data.csv", The data do not appear to be using relative abundance tables as inputs. Question 1: So, for the 16s experiment, the input to SCRuB should be "OTU/ASV count"?

As for metagenomic data,As I understand it, removal can be done by constructing MAG(metaOTU) abundance tables But viral data are difficult to bin and thus constitute a MAG, and they are often found in the form of a Contig. Question 2: I would like to ask if SCRuB can decontaminate based on contig abundance (is this scientific?)

Question 3: If contig abundance can be used as an input, Does this mean that I can use software such as "BWA-MEM OR bowtie2" to quantify the contig, then calculate the RPM or TPM, and then use them as the abundance for SCRuB input?

Congratulations on your article, it is an exciting tool for low biomass research and we would appreciate it if you could respond to our questions!

gaustin15 commented 1 year ago

Thanks for the comments! To your questions –

  1. The input we tested are OTU/ASV counts, so this is our general recommendation (see the discussion here regarding some alternative inputs). The package will also run if you provide ot relative abundance data (multiplied out by some scaling factor to represent the data in ‘count/integer’ space), which would introduce an assumption that that we have the same level of confidence in the relative abundance for all samples/controls, but it does mean that we lose the ability to place greater weight on samples/controls with more reads. While in our synthetic simulations, we did evaluate scenarios similar to this (with positive results), we do generally recommend ASV counts if possible
  2. While we did not show any results in our paper decontaminating at the contig level, we have internally run some simulations using contig counts to run SCRuB, and found that this a reasonable application of the package. One note of caution is that we have often found it difficult to obtain a sufficient read depth in both the samples and negative controls to precisely estimate the contamination source, as we’re far more susceptible to sampling noise in the contig / low depth space. However, if this isn’t a concern in your data then I would consider a contig count matrix to be a valid input for SCRuB.
  3. This seems like a reasonable pipeline to me; as I mentioned in point (2), the biggest concern would be if you can obtain sufficient read depth.

Please don’t hesitate to reach out if you have any other questions

talkorem commented 1 year ago

I'm converting this to a discussion.