Katsevich-Lab / sceptre

An R package for single-cell CRISPR screen data analysis emphasizing statistical rigor, massive scalability, and ease of use.
https://katsevich-lab.github.io/sceptre/
GNU General Public License v3.0
26 stars 8 forks source link

Error when running calibration check on low MOI data #26

Closed ale-gomez closed 1 year ago

ale-gomez commented 1 year ago

Hello! I am running into an error when running a calibration check in the low MOI. It looks like this:

Running setup.  ✓
Constructing negative control pairs. ✓
Generating permutation resamples. ✓
Running differential expression analyses.
Error in data.table::setorderv(ret, cols = c("p_value", "response_id"),  : 
  some columns are not in the data.table: p_value,response_id
In addition: Warning message:
In construct_negative_control_pairs(n_calibration_pairs, calibration_group_size,  :
  Unable to generate 31713 negative control pairs. Consider increasing `calibration_group_size`.

Below is what I'm running:

calibration_result <- run_sceptre_lowmoi(
  response_matrix = response_matrix_lowmoi,
  grna_matrix = grna_matrix_lowmoi,
  covariate_data_frame = covariate_data_frame_lowmoi,
  grna_group_data_frame = grna_group_data_frame_lowmoi,
  formula_object = formula_object,
  response_grna_group_pairs = response_grna_group_pairs,
  calibration_check = TRUE # calibration_check TRUE
) 

One important note is that I only have one guide for my non-targeting group. Would that be the source of the issue? Is it not possible to run the calibration check unless there are multiple non-targeting guides in the non-targeting group?

ekatsevi commented 1 year ago

Indeed, unfortunately the calibration check works only if you have more than one non-targeting guide. We recommend having multiple non-targeting guides in your pooled CRISPR screen experiments. You can still apply sceptre to the targeting gRNAs, albeit without the peace-of-mind that the method is well-calibrated on your particular data.

ale-gomez commented 1 year ago

I see. I suspected that was the case. In that case, I continued to run the discovery analysis code:

discovery_result <- [run_sceptre_lowmoi](https://katsevich-lab.github.io/sceptre/reference/run_sceptre_lowmoi.html)(
  response_matrix = response_matrix_lowmoi,
  grna_matrix = grna_matrix_lowmoi,
  covariate_data_frame = covariate_data_frame_lowmoi,
  grna_group_data_frame = grna_group_data_frame_lowmoi,
  formula_object = formula_object,
  response_grna_group_pairs = response_grna_group_pairs,
  calibration_check = FALSE # calibration check FALSE
)

But I still get another error here at this point in the analysis (it starts, but crashes at 1580/36601):

Analyzing pairs containing response AC008937.3 (1580 of 36601)
Error: Error in function boost::math::skew_normal_distribution<double>::skew_normal_distribution: Shape parameter is nan, but must be finite!

Is this related to the above or something different? Also, what is the reasoning behind having multiple non-targeting guides? Is it a way for the algorithm to have more confidence in the controls?

ekatsevi commented 1 year ago

But I still get another error here at this point in the analysis (it starts, but crashes at 1580/36601)...Is this related to the above or something different?

We will get back to you on this shortly.

Also, what is the reasoning behind having multiple non-targeting guides? Is it a way for the algorithm to have more confidence in the controls?

The calibration check proceeds by applying the test to each NT gRNA and each gene, making sure there the resulting p-values are not too small. The test consists of comparing the expression of the gene in cells containing the tested gRNA and cells containing other NT gRNAs (please read our preprint for more details on the calibration check). If there is only one NT gRNA, then there are no cells left containing other NT gRNAs.

timothy-barry commented 1 year ago

Hello,

I'd like to try to help debug the error that you're encountering. As a starting point, could you call the traceback() function in the console and print the output so that we can get a better sense of which function might be causing issues? At some point it may be necessary for you to share a subset of your data.

Thanks.

ale-gomez commented 1 year ago

Hi Tim, Thank you for your prompt response and apologies for the delay! Here's the output of the traceback() function:

traceback()
14: stop(structure(list(message = "Error in function boost::math::skew_normal_distribution<double>::skew_normal_distribution: Shape parameter is nan, but must be finite!", 
        call = NULL, cppstack = NULL), class = c("boost::wrapexcept<std::domain_error>", 
    "C++Error", "error", "condition")))
13: run_low_level_test_distilled(y = curr_expression_vector, mu = precomp_pieces$mu, 
        a = precomp_pieces$a, b = precomp_pieces$b, n_cntrl = n_cntrl, 
        n_trt = n_trt, synthetic_idxs = synthetic_idxs, B1 = B1, 
        B2 = B2, B3 = B3, fit_skew_normal = fit_skew_normal, return_resampling_dist = return_resampling_dist)
12: backup_distilled(curr_expression_vector, curr_covariate_matrix, 
        response_precomp, n_cntrl, n_trt, synthetic_idxs, B1, B2, 
        B3, fit_skew_normal, return_resampling_dist)
11: value[[3L]](cond)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
9: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
8: doTryCatch(return(expr), name, parentenv, handler)
7: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), 
       names[nh], parentenv, handlers[[nh]])
6: tryCatchList(expr, classes, parentenv, handlers)
5: tryCatch({
       precomp_pieces <- compute_precomputation_pieces(expression_vector = curr_expression_vector, 
           covariate_matrix = curr_covariate_matrix, fitted_coefs = response_precomp$fitted_coefs, 
           theta = response_precomp$theta, full_test_stat = TRUE)
       D <- compute_D_matrix(precomp_pieces$Zt_wZ, precomp_pieces$wZ)
       run_low_level_test_full_v2(y = curr_expression_vector, mu = precomp_pieces$mu, 
           a = precomp_pieces$a, w = precomp_pieces$w, D = D, n_cntrl = n_cntrl, 
           n_trt = n_trt, synthetic_idxs = synthetic_idxs, B1 = B1, 
           B2 = B2, B3 = B3, fit_skew_normal = fit_skew_normal, 
           return_resampling_dist = return_resampling_dist)
   }, error = function(e) backup_distilled(curr_expression_vector, 
       curr_covariate_matrix, response_precomp, n_cntrl, n_trt, 
       synthetic_idxs, B1, B2, B3, fit_skew_normal, return_resampling_dist), 
       warning = function(w) backup_distilled(curr_expression_vector, 
           curr_covariate_matrix, response_precomp, n_cntrl, n_trt, 
           synthetic_idxs, B1, B2, B3, fit_skew_normal, return_resampling_dist))
4: lowmoi_exact_stat_discovery(synthetic_idxs = <pointer: 0x55cc72591760>, 
       B1 = 499L, B2 = 4999L, B3 = 24999L, fit_skew_normal = TRUE, 
       return_resampling_dist = FALSE, grna_group_idxs = list(SATB2 = c(1L, 
       3L, 5L, 7L, 12L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
       29L, 32L, 35L, 39L, 40L, 42L, 43L, 44L, 46L, 52L, 53L, 58L, 
       63L, 65L, 68L, 75L, 83L, 86L, 87L, 88L, 93L, 96L, 99L, 103L, 
       104L, 105L, 106L, 107L, 108L, 114L, 115L, 117L, 120L, 121L, 
       123L, 124L, 125L, 127L, 130L, 131L, 139L, 140L, 141L, 143L, 
       145L, 147L, 149L, 153L, 162L, 165L, 167L, 169L, 171L, 172L, 
       177L, 178L, 181L, 183L, 184L, 186L, 189L, 199L, 202L, 204L, 
       210L, 212L, 214L, 218L, 222L, 226L, 227L, 231L, 234L, 235L, 
       236L, 237L, 238L, 243L, 247L, 249L, 250L, 252L, 254L, 256L, 
       257L, 261L, 262L, 263L, 266L, 268L, 271L, 274L, 276L, 277L, 
       278L, 279L, 283L, 284L, 285L, 287L, 290L, 292L, 293L, 296L, 
       297L, 304L, 307L, 310L, 311L, 315L, 320L, 323L, 324L, 327L, 
       329L, 331L, 334L, 337L, 340L, 341L, 342L, 345L, 352L, 353L, 
       354L, 356L, 360L, 363L, 365L, 366L, 367L, 371L, 373L, 376L, 
       383L, 385L, 388L, 390L, 391L, 395L, 397L, 402L, 404L, 406L, 
       408L, 409L, 410L, 412L, 414L, 416L, 417L, 418L, 420L, 423L, 
       424L, 425L, 427L, 428L, 429L, 430L, 431L, 435L, 436L, 440L, 
    ...
3: do.call(what = low_level_association_funct, args = args_to_pass)
2: run_lowmoi_in_memory(response_matrix, grna_assignments, covariate_matrix, 
       response_grna_group_pairs, synthetic_idxs, return_resampling_dist, 
       fit_skew_normal, B1, B2, B3, calibration_check, discovery_test_stat, 
       n_nonzero_trt_thresh, n_nonzero_cntrl_thresh, return_debugging_metrics, 
       regression_method, print_progress)
1: run_sceptre_lowmoi(response_matrix = response_matrix_lowmoi, 
       grna_matrix = grna_matrix_lowmoi, covariate_data_frame = covariate_data_frame_lowmoi, 
       grna_group_data_frame = grna_group_data_frame_lowmoi, formula_object = formula_object, 
       response_grna_group_pairs = response_grna_group_pairs, calibration_check = FALSE)

Also, I'd be happy to share the data/a subset if needed. It's no problem at all! Just let me know what you would need and I'll provide it. Thank you both for explaining everything and all the help!

timothy-barry commented 1 year ago

Hi Ale,

Thank you. This is a helpful starting point.

Would you be able to send (e.g., via email) the objects that you passed to the run_sceptre_lowmoi function? My email is tbarry2@andrew.cmu.edu.

timothy-barry commented 1 year ago

Hi Ale,

Thanks for the email. Could you please instead send your objects in .rds format? For example, you could execute the following code:

my_objects <- list(response_matrix = response_matrix_lowmoi,
  grna_matrix = grna_matrix_lowmoi,
  covariate_data_frame = covariate_data_frame_lowmoi,
  grna_group_data_frame = grna_group_data_frame_lowmoi,
  formula_object = formula_object,
  response_grna_group_pairs = response_grna_group_pairs)
saveRDS(my_objects, "~/my_objects.rds")

Then you could email my_objects.rds. Thanks!

ale-gomez commented 1 year ago

Hi Tim, Sent! I hope that helps.

timothy-barry commented 1 year ago

Great, thanks. I'll take a look and get back to you shortly.

timothy-barry commented 1 year ago

Hi Ale,

Thanks for sending the data. I believe that I located and fixed the bug. I am now able to run the method on your data without errors. Would you mind giving it a shot?

ale-gomez commented 1 year ago

Hi Tim, It's weird but I'm still getting the same error at the same point. I tried reinstalling the package but it didn't change. Maybe I'm missing something?

UPDATE: I tried reinstalling once more and it worked! Thank you so much for your help on this!