Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.
https://katsevich-lab.github.io/sceptre/
GNU General Public License v3.0
24 stars 7 forks source link

Question about negative/positive controls #37

Closed nchernia closed 1 year ago

nchernia commented 1 year ago

Thanks again for the tool! We're using it in possibly a slightly different way than originally intended; our experiment probes many targets over a small region and is only interested in 2 or 3 genes. We thus have ~60 negative control pairs, 5 positive control pairs, and ~500 candidates.

Should we augment the negative control pairs with genes that are far out of region that we're not actually testing as candidates? I saw you recommend at least as many negative control pairs as candidates.

Do you have any other recommendations or pitfalls given this experimental design? So far we've spent a lot of time testing confounders and we're now pretty happy with our QQ plots but want to be sure things are properly calibrated; do you have any advice on any other QCs?

ekatsevi commented 1 year ago

Thanks again for the tool! We're using it in possibly a slightly different way than originally intended; our experiment probes many targets over a small region and is only interested in 2 or 3 genes. We thus have ~60 negative control pairs, 5 positive control pairs, and ~500 candidates.

Should we augment the negative control pairs with genes that are far out of region that we're not actually testing as candidates? I saw you recommend at least as many negative control pairs as candidates.

We think your negative control pairs should be as similar to your candidate pairs as possible. For example, if you are grouping your candidate gRNAs prior to testing, then you should group your NT gRNAs prior to calibration checking. Also, as you mention, it's good practice to have at least as many negative control pairs as candidate pairs. In your case, it sounds like using the 2-3 genes of interest does not yield enough negative control pairs. If you do have expression data for genes beyond the 2-3 genes of interest, I would recommend finding genes that have similar expression levels as your genes of interest and pair them with each negative control gRNA (or each group of negative control gRNAs, if you are grouping gRNAs). Take enough of these extra genes to get up to roughly as many negative control pairs as you have candidate pairs.

FYI: In the next software release, we will be adding automatic calibration checking functionality for high-MOI (as we currently have for low-MOI) so you won't need to worry about constructing negative control pairs yourself.

Do you have any other recommendations or pitfalls given this experimental design? So far we've spent a lot of time testing confounders and we're now pretty happy with our QQ plots but want to be sure things are properly calibrated; do you have any advice on any other QCs?

I do see one subtle issue, which may or may not be a concern. If I recall correctly that you're in high-MOI, then your analysis will be using the "complement set" as the control group. This means that you'll be comparing the expression of a gene in cells with a given gRNA to that of all other cells. This means that you're effectively testing whether a given gRNA has an effect on a gene that is different from the average effect of all the other gRNAs in the screen. Usually, when you have gRNAs targeting all over the genome (e.g. as in Gasperini et al), the aforementioned "average effect of all the other gRNAs" on a given gene is nearly zero, which makes the comparison meaningful. In your case, if all the other gRNAs (aside from NT gRNAs) are also targeting enhancers near your 2-3 genes of interest, the average effect of those gRNAs on your gene may be nontrivial, complicating the interpretation of your results. The good news is that if this problem exists, it should manifest itself as inflation or deflation in your calibration check QQ plot. The bad news is that the strategy I proposed above for augmenting your negative control pairs to increase their number artificially ameliorates the issue by introducing additional genes that are not used in your analysis.

I realize this was somewhat of a complicated explanation, and I'm not sure I communicated it clearly. Perhaps you shouldn't worry about it unless you get weird results from your association analysis.

redbybeing commented 1 year ago

I just happened to read this thread, and it made me think about my negative and positive control pairs. I use high-moi paradigm. I have 80 gRNAs for the 40 target genes (2 gRNAs/gene). But I will combine the 2 gRNAs/genes so there will be 40 gRNA 'groups' for the 40 target genes. And, I only have ONE negative control gRNA ("GFP gRNA").

Currently, I defined my negative control pairs as GFP gRNA-the 40 target genes, so I have 40 negative control pairs. And my positive control pairs are: gRNA1-gene1, gRNA2-gene2, ... so I also have 40 positive control pairs. And for my candidate pairs, I was thinking of pairing 40 gRNAs x all 30,000 genes in the reference transcriptome = 1.2 million pairs.

And now I read from this thread that: negative control pairs should be as similar to your candidate pairs as possible.

Then, should I pair my one and only GFP guide to all 30,000 genes in the reference transcriptome? Even then I only have 30,000 pairs and it's way less than 1.2 million candidate pairs.

My ultimate goal is to see how knocking down a gene affects expression of other genes in the transcriptome. I do not have prior knowledge or any expectation about which genes will be affected by perturbing a CRISPR target gene.

Could you give me some adivce on my case? Thank you so much always for your prompt response and help!!

ekatsevi commented 1 year ago

Then, should I pair my one and only GFP guide to all 30,000 genes in the reference transcriptome? Even then I only have 30,000 pairs and it's way less than 1.2 million candidate pairs.

Yes, pairing your negative control guide to all 30,000 genes is the best you can do in your case. It won't be the ideal calibration check but it's still worth carrying out.

In general, having several negative control gRNAs is better; I would recommend this for your future experiments.

redbybeing commented 1 year ago

Thanks Eugene!

nchernia commented 1 year ago

Thanks for your response! It’s very helpful. I tried running with additional negative controls - adding in 25 housekeeping genes vs the negative control targets - and other than running much slower, and more points in the negative control QQ plot, there wasn’t any difference in p-values of the candidate pairs. I’m not sure if that’s expected or not - I think maybe it is since you don't require positive or negative controls other than for QC?

On Thu, Jun 22, 2023 at 7:43 PM Eugene Katsevich @.***> wrote:

Thanks again for the tool! We're using it in possibly a slightly different way than originally intended; our experiment probes many targets over a small region and is only interested in 2 or 3 genes. We thus have ~60 negative control pairs, 5 positive control pairs, and ~500 candidates.

Should we augment the negative control pairs with genes that are far out of region that we're not actually testing as candidates? I saw you recommend at least as many negative control pairs as candidates.

We think your negative control pairs should be as similar to your candidate pairs as possible. For example, if you are grouping your candidate gRNAs prior to testing, then you should group your NT gRNAs prior to calibration checking. Also, as you mention, it's good practice to have at least as many negative control pairs as candidate pairs. In your case, it sounds like using the 2-3 genes of interest does not yield enough negative control pairs. If you do have expression data for genes beyond the 2-3 genes of interest, I would recommend finding genes that have similar expression levels as your genes of interest and pair them with each negative control gRNA (or each group of negative control gRNAs, if you are grouping gRNAs). Take enough of these extra genes to get up to roughly as many negative control pairs as you have candidate pairs.

FYI: In the next software release, we will be adding automatic calibration checking functionality for high-MOI (as we currently have for low-MOI) so you won't need to worry about constructing negative control pairs yourself.

Do you have any other recommendations or pitfalls given this experimental design? So far we've spent a lot of time testing confounders and we're now pretty happy with our QQ plots but want to be sure things are properly calibrated; do you have any advice on any other QCs?

I do see one subtle issue, which may or may not be a concern. If I recall correctly that you're in high-MOI, then your analysis will be using the "complement set" as the control group. This means that you'll be comparing the expression of a gene in cells with a given gRNA to that of all other cells. This means that you're effectively testing whether a given gRNA has an effect on a gene that is different from the average effect of all the other gRNAs in the screen. Usually, when you have gRNAs targeting all over the genome (e.g. as in Gasperini et al), the aforementioned "average effect of all the other gRNAs" on a given gene is nearly zero, which makes the comparison meaningful. In your case, if all the other gRNAs (aside from NT gRNAs) are also targeting enhancers near your 2-3 genes of interest, the average effect of those gRNAs on your gene may be nontrivial, complicating the interpretation of your results. The good news is that if this problem exists, it should manifest itself as inflation or deflation in your calibration check QQ plot. The bad news is that the strategy I proposed above for augmenting your negative control pairs to increase their number artificially ameliorates the issue by introducing additional genes that are not used in your analysis.

I realize this was somewhat of a complicated explanation, and I'm not sure I communicated it clearly. Perhaps you shouldn't worry about it unless you get weird results from your association analysis.

— Reply to this email directly, view it on GitHub https://github.com/Katsevich-Lab/sceptre/issues/37#issuecomment-1603442681, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW552VUDMQSZFBT2KCDXMTKCLANCNFSM6AAAAAAZQRHB3Q . You are receiving this because you authored the thread.Message ID: @.***>

ekatsevi commented 1 year ago

I think there is a misunderstanding here. The purpose of running SCEPTRE on negative control pairs is to ascertain that it is well-calibrated; see the section Negative control pairs of the high-MOI tutorial and the section titled "SCEPTRE demonstrates good calibration and sensitivity on real and simulated data" of Barry et al (Genome Biology 2021). As our low-MOI tutorial does a better job explaining, you should view the workflow as follows:

  1. Check calibration by applying SCEPTRE to negative control pairs.
  2. If step 1 proceeds successfully, proceed to analyzing your candidate pairs.

So adding more negative control pairs will have no impact on the p-values of the candidate pairs. Our suggestion to have your negative control pairs be as similar as possible to your candidate pairs is to make your calibration check in step 1 faithful to the way you plan to analyze your candidate pairs in step 2.

We will be adding functionality soon for this process to be smoother and clearer for high-MOI SCEPTRE.

nchernia commented 1 year ago

Great, thanks. We did come to understand that but I wanted to be sure.

On Wed, Jun 28, 2023 at 7:14 PM Eugene Katsevich @.***> wrote:

I think there is a misunderstanding here. The purpose of running SCEPTRE on negative control pairs is to ascertain that it is well-calibrated; see the section Negative control pairs https://katsevich-lab.github.io/sceptre/articles/highmoi_tutorial.html#negative-control-pairs of the high-MOI tutorial and the section titled "SCEPTRE demonstrates good calibration and sensitivity on real and simulated data" of Barry et al (Genome Biology 2021). As our low-MOI tutorial https://katsevich-lab.github.io/sceptre/articles/lowmoi_tutorial.html does a better job explaining, you should view the workflow as follows:

  1. Check calibration by applying SCEPTRE to negative control pairs.
  2. If step 1 proceeds successfully, proceed to analyzing your candidate pairs.

So adding more negative control pairs will have no impact on the p-values of the candidate pairs. Our suggestion to have your negative control pairs be as similar as possible to your candidate pairs is to make your calibration check in step 1 faithful to the way you plan to analyze your candidate pairs in step 2.

— Reply to this email directly, view it on GitHub https://github.com/Katsevich-Lab/sceptre/issues/37#issuecomment-1612227341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EWYN3GJ4AGA7P6N7TNTXNS3EFANCNFSM6AAAAAAZQRHB3Q . You are receiving this because you authored the thread.Message ID: @.***>

-- Neva Cherniavsky Durand, Ph.D. | she, her, hers Senior Scientist | Gene Regulation Observatory Broad Institute of MIT and Harvard

timothy-barry commented 1 year ago

Good discussion, closing.