caravagnalab / rcongas

rcongas
GNU General Public License v3.0
7 stars 1 forks source link

Using Rcongas+ only in the scATAC modality #35

Open ccruizm opened 7 months ago

ccruizm commented 7 months ago

Good day!

I have read the documentation in https://caravagnalab.github.io/rcongas/index.html, and it provides valuable information on how to run Rcongas on multimodal data, but it is not clear to me whether it can work only using the scATAC modality. If so, what adjustments would I need to make to run it correctly?

Also, after running the pipeline, do the results also include the classification of a cell to be either diploid or aneuploid?

Thanks in advance!

caravagn commented 7 months ago

Hi, I think it should work in that case.

@lucreziaPatruno can you provide some basic steps?

lucreziaPatruno commented 6 months ago

Hi @ccruizm, thanks for your interest in CONGAS+. In order to use only the ATAC modality you can follow the steps presented in the tutorial for building the ATAC count tibble, and then in order to create the CONGAS+ object you can use this code:

x = init(
  rna = NULL,
  atac = atac,
  segmentation = segments, 
  rna_normalisation_factors = NULL,
  atac_normalisation_factors = norm_atac,
  atac_likelihood = 'NB',
  description = '')

Next, you can follow the tutorial for fitting a model, setting the lambda parameter of fit_congas to 0:

fit_filt <- Rcongas:::(filt,
                                 K = k, 
                                 lambdas = 0, 
                                 learning_rate = lr, 
                                 steps = steps,
                                 model_parameters = hyperparams_filt, 
                                 model_selection = model,
                                 latent_variables = "G",
                                 CUDA = FALSE,
                                 temperature = temperature, 
                                 same_mixing = TRUE, 
                                 threshold = 0.001)

Hope this helps, please let me know if you have further questions.

BW Lucrezia

ccruizm commented 6 months ago

Thanks @lucreziaPatruno for the code!

I am running into some issues:

  1. When running segments_selector_congas I get the following error:
    
    ▣ / [~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~] 100% [ETA  0s] ▶ 00:00:19

── (R)CONGAS+ hyperparameters auto-config ──────────────────────────────────────────────────────────────────────────────────────────────────────────────

── ATAC modality ──

→ Negative Binomial likelihood, estimating Gamma shape and rate

── Estimating segment factors

→ 1: chrY:10400000:57227415 theta_shape = 2.11866186334306, theta_rate = 0.0837168665702484

── (R)CONGAS+ Variational Inference ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Fit with k = 1 and lambda = 0.5.

── Fit with k = 2 and lambda = 0.5.

── Fit with k = 3 and lambda = 0.5.

[easypar] 3/3 computations returned errors and will be removed.

── (R)CONGAS+ fits completed in 39ms. ──

Error in f(): ! Argument 1 must be a data frame or a named atomic vector. Traceback:

  1. segments_selector_congas(filt)
  2. Reduce(bind_rows, report)
  3. f(init, x[[i]])
  4. abort(glue("Argument {i} must be a data frame or a named atomic vector."))
  5. signal_abort(cnd, .file)
    Then I skipped this one and ran `fit_congas` using the code you provided (`auto_config_run` did not give any error), but got an error:

    ✖ Warning ATAC 0-counts cells. 5795 cells have no data in any of 45 segments, top 5 with missing data are:

✖ Cell atac_T19-91014_CTCACCACATGGTAAA-1 with 5 0-segments (11%)

✖ Cell atac_T19-91014_GAACCGCCAATGACTC-1 with 5 0-segments (11%)

✖ Cell atac_T19-91014_GACCTTCCATACAACC-1 with 5 0-segments (11%)

✖ Cell atac_T19-91014_ACCATCCAGTACCCAT-1 with 4 0-segments (9%)

✖ Cell atac_T19-91014_ACCGGGTTCATCGCTC-1 with 4 0-segments (9%)

── (R)CONGAS+ Variational Inference ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Fit with k = 1 and lambda = 0.

── Fit with k = 2 and lambda = 0.

── Fit with k = 3 and lambda = 0.

── Fit with k = 4 and lambda = 0.

[easypar] 4/4 computations returned errors and will be removed.

── (R)CONGAS+ fits completed in 48ms. ──

Error in pull(): ! Can't extract columns that don't exist. ✖ Column BIC doesn't exist. Traceback:

  1. Rcongas:::fit_congas(filt, K = k, lambdas = 0, learning_rate = lr, . steps = steps, model_parameters = hyperparams_filt, model_selection = model, . latent_variables = "G", CUDA = TRUE, temperature = temperature, . same_mixing = TRUE, threshold = 0.001)
  2. order(model_selection_df %>% pull(!!model_selection))
  3. model_selection_df %>% pull(!!model_selection)
  4. pull(., !!model_selection)
  5. pull.data.frame(., !!model_selection)
  6. tidyselect::vars_pull(names(.data), !!enquo(var))
  7. pull_as_location2(loc, n, vars, error_arg = error_arg, error_call = error_call)
  8. with_subscript_errors(type = "pull", { . i <- vctrs::vec_as_subscript2(i, logical = "error", arg = error_arg, . call = error_call) . if (length(i) != 1) { . cli::cli_abort("{.arg {error_arg}} must select exactly one column.", . call = error_call) . } . if (is.numeric(i)) { . vctrs::num_as_location2(i, n = n, negative = "ignore", . arg = error_arg, call = error_call) . } . else { . vctrs::vec_as_location2(i, n = n, names = names, arg = error_arg, . call = error_call, ) . } . })
  9. try_fetch(expr, vctrs_error_subscript = function(cnd) { . cnd$subscript_action <- subscript_action(type) . cnd$subscript_elt <- "column" . cnd_signal(cnd) . })
  10. withCallingHandlers(expr, condition = function(cnd) { . { . .handler_frame. <- TRUE . .setup_frame. <- frame . if (inherits(cnd, "message")) { . except <- c("warning", "error") . } . else if (inherits(cnd, "warning")) { . except <- "error" . } . else { . except <- "" . } . } . while (!is_null(cnd)) { . if (inherits(cnd, "vctrs_error_subscript")) { . out <- handlers[1L] . if (!inherits(out, "rlang_zap")) . throw(out) . } . inherit <- .subset2(.subset2(cnd, "rlang"), "inherit") . if (is_false(inherit)) { . return() . } . cnd <- .subset2(cnd, "parent") . } . })
  11. vctrs::vec_as_location2(i, n = n, names = names, arg = error_arg, . call = error_call, )
  12. result_get(vec_as_location2_result(i, n = n, names = names, negative = "error", . missing = missing, arg = arg, call = call))
  13. cnd_signal(x$err)
  14. signal_abort(cnd)
  15. signalCondition(cnd)
  16. (function (cnd) . { . { . .handler_frame. <- TRUE . .setup_frame. <- frame . if (inherits(cnd, "message")) { . except <- c("warning", "error") . } . else if (inherits(cnd, "warning")) { . except <- "error" . } . else { . except <- "" . } . } . while (!is_null(cnd)) { . if (inherits(cnd, "vctrs_error_subscript")) { . out <- handlers[1L] . if (!inherits(out, "rlang_zap")) . throw(out) . } . inherit <- .subset2(.subset2(cnd, "rlang"), "inherit") . if (is_false(inherit)) { . return() . } . cnd <- .subset2(cnd, "parent") . } . })(structure(list(message = "", trace = structure(list(call = list( . IRkernel::main(), kernel$run(), handle_shell(), executor$execute(msg), . tryCatch(evaluate(request$content$code, envir = .GlobalEnv, . output_handler = oh, stop_on_error = 1L), interrupt = function(cond) { . log_debug("Interrupt during execution") . interrupted <<- TRUE . }, error = .self$handle_error), tryCatchList(expr, classes, . parentenv, handlers), tryCatchOne(tryCatchList(expr, . names[-nh], parentenv, handlers[-nh]), names[nh], parentenv, . handlers[[nh]]), doTryCatch(return(expr), name, parentenv, . handler), tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), . tryCatchOne(expr, names, parentenv, handlers[[1L]]), doTryCatch(return(expr), . name, parentenv, handler), evaluate(request$content$code, . envir = .GlobalEnv, output_handler = oh, stop_on_error = 1L), . evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, . debug = debug, last = i == length(out), use_try = stop_on_error != . 2L, keep_warning = keep_warning, keep_message = keep_message, . log_echo = log_echo, log_warning = log_warning, output_handler = output_handler, . include_timing = include_timing), timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler))), handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler)), try(f, silent = TRUE), tryCatch(expr, . error = function(e) { . call <- conditionCall(e) . if (!is.null(call)) { . if (identical(call[[1L]], quote(doTryCatch))) . call <- sys.call(-4L) . dcall <- deparse(call, nlines = 1L) . prefix <- paste("Error in", dcall, ": ") . LONG <- 75L . sm <- strsplit(conditionMessage(e), "\n")[[1L]] . w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], . type = "w") . if (is.na(w)) . w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], . type = "b") . if (w > LONG) . prefix <- paste0(prefix, "\n ") . } . else prefix <- "Error : " . msg <- paste0(prefix, conditionMessage(e), "\n") . .Internal(seterrmessage(msg[1L])) . if (!silent && isTRUE(getOption("show.error.messages"))) { . cat(msg, file = outFile) . .Internal(printDeferredWarnings()) . } . invisible(structure(msg, class = "try-error", condition = e)) . }), tryCatchList(expr, classes, parentenv, handlers), . tryCatchOne(expr, names, parentenv, handlers[[1L]]), doTryCatch(return(expr), . name, parentenv, handler), withCallingHandlers(withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), warning = wHandler, error = eHandler, . message = mHandler), withVisible(eval_with_user_handlers(expr, . envir, enclos, user_handlers)), eval_with_user_handlers(expr, . envir, enclos, user_handlers), eval(expr, envir, enclos), . eval(expr, envir, enclos), Rcongas:::fit_congas(filt, K = k, . lambdas = 0, learning_rate = lr, steps = steps, model_parameters = hyperparams_filt, . model_selection = model, latent_variables = "G", CUDA = TRUE, . temperature = temperature, same_mixing = TRUE, threshold = 0.001), . order(model_selection_df %>% pull(!!model_selection)), model_selection_df %>% . pull(!!model_selection), pull(., !!model_selection), . pull.data.frame(., !!model_selection), tidyselect::vars_pull(names(.data), . !!enquo(var)), pull_as_location2(loc, n, vars, error_arg = error_arg, . error_call = error_call), with_subscript_errors(type = "pull", . { . i <- vctrs::vec_as_subscript2(i, logical = "error", . arg = error_arg, call = error_call) . if (length(i) != 1) { . cli::cli_abort("{.arg {error_arg}} must select exactly one column.", . call = error_call) . } . if (is.numeric(i)) { . vctrs::num_as_location2(i, n = n, negative = "ignore", . arg = error_arg, call = error_call) . } . else { . vctrs::vec_as_location2(i, n = n, names = names, . arg = error_arg, call = error_call, ) . } . }), try_fetch(expr, vctrs_error_subscript = function(cnd) { . cnd$subscript_action <- subscript_action(type) . cnd$subscript_elt <- "column" . cnd_signal(cnd) . }), withCallingHandlers(expr, condition = function(cnd) { . { . .handler_frame. <- TRUE . .setup_frame. <- frame . if (inherits(cnd, "message")) { . except <- c("warning", "error") . } . else if (inherits(cnd, "warning")) { . except <- "error" . } . else { . except <- "" . } . } . while (!is_null(cnd)) { . if (inherits(cnd, "vctrs_error_subscript")) { . out <- handlers[1L] . if (!inherits(out, "rlang_zap")) . throw(out) . } . inherit <- .subset2(.subset2(cnd, "rlang"), "inherit") . if (is_false(inherit)) { . return() . } . cnd <- .subset2(cnd, "parent") . } . }), vctrs::vec_as_location2(i, n = n, names = names, arg = error_arg, . call = error_call, ), result_get(vec_as_location2_result(i, . n = n, names = names, negative = "error", missing = missing, . arg = arg, call = call)), vec_as_location2_result(i, . n = n, names = names, negative = "error", missing = missing, . arg = arg, call = call), tryCatch(vec_as_location(i, . n, names = names, arg = arg, call = call), vctrs_error_subscript = function(err) { . err[["subscript_scalar"]] <- TRUE . err <<- err . i . }), tryCatchList(expr, classes, parentenv, handlers), tryCatchOne(expr, . names, parentenv, handlers[[1L]]), doTryCatch(return(expr), . name, parentenv, handler), vec_as_location(i, n, names = names, . arg = arg, call = call), <fn>(), stop_subscript_oob(i = i, . subscript_type = subscript_type, names = names, subscript_action = subscript_action, . subscript_arg = subscript_arg, call = call), stop_subscript(class = "vctrs_error_subscript_oob", . i = i, subscript_type = subscript_type, ..., call = call), . abort(class = c(class, "vctrs_error_subscript"), i = i, ..., . call = call)), parent = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, . 7L, 6L, 9L, 10L, 4L, 12L, 13L, 13L, 15L, 16L, 17L, 18L, 19L, . 13L, 13L, 13L, 23L, 24L, 0L, 26L, 26L, 0L, 0L, 30L, 31L, 32L, . 33L, 34L, 32L, 36L, 36L, 38L, 39L, 40L, 41L, 38L, 0L, 44L, 45L, . 46L), visible = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), namespace = c("IRkernel", . NA, "IRkernel", NA, "base", "base", "base", "base", "base", "base", . "base", "evaluate", "evaluate", "evaluate", "evaluate", "base", . "base", "base", "base", "base", "base", "base", "evaluate", "base", . "base", "Rcongas", "base", NA, "dplyr", "dplyr", "tidyselect", . "tidyselect", "tidyselect", "rlang", "base", "vctrs", "vctrs", . "vctrs", "base", "base", "base", "base", "vctrs", "vctrs", "vctrs", . "vctrs", "rlang"), scope = c("::", NA, "local", NA, "::", "local", . "local", "local", "local", "local", "local", "::", ":::", "local", . "local", "::", "::", "local", "local", "local", "::", "::", ":::", . "::", "::", ":::", "::", NA, "::", ":::", "::", ":::", ":::", . "::", "::", "::", ":::", ":::", "::", "local", "local", "local", . "::", "local", ":::", ":::", "::"), error_frame = c(FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, . FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE . )), row.names = c(NA, -47L), version = 2L, class = c("rlang_trace", . "rlib_trace", "tbl", "data.frame")), parent = NULL, i = "BIC", . subscript_type = "character", names = c("hyperparameter_K", . "K", "K_rna", "K_atac", "lambda"), subscript_action = NULL, . subscript_arg = "!!enquo(var)", rlang = list(inherit = TRUE), . call = pull(., !!model_selection), subscript_scalar = TRUE), class = c("vctrs_error_subscript_oob", . "vctrs_error_subscript", "rlang_error", "error", "condition")))
  17. handlers[1L]
  18. cnd_signal(cnd)
  19. signal_abort(cnd)
    
    Where should I add the missing column?

Thanks for the help!

Militeee commented 6 months ago

Hi @ccruizm,

Unfortunately, it is very hard for us to debug internal errors with the reticulate interface we have right now. Can you by any chance provide a minimal reproducing example for the error? I'll try to work on it ASAP.

Regarding your question: no we don't automatically classify aneuploid vs diploid, but generally unless it is very small the diploid cluster is quite obvious.

Cheers, S

lucreziaPatruno commented 6 months ago

Hi @ccruizm, It's also possible that something went wrong in the reticulate setup and CONGAS+ inference doesn't start, hence why you get that error. Can you please try to load congas through reticulate like this:

library(reticulate)
cg = reticulate::import("congas")

and please let me know whether you have any issue or if you are able to load it without any error.

BW Lucrezia