caravagnalab / rcongas

rcongas
GNU General Public License v3.0
7 stars 1 forks source link

Error running fit_congas on multiome data #37

Closed mmyers1 closed 5 months ago

mmyers1 commented 6 months ago

I was able to run segments_selector_congas() but fit_congas() failed with the following errors: [easypar] 4/4 computations returned errors and will be removed.

Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.

I get a similar error if I try to rerun with a different model selection criterion.

I was able to run successfully on the demo data using the same code and environment. I am using the most recent GitHub version of RCONGAS.

here's the full trace:

<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
---
Backtrace:
     ▆
  1. ├─Rcongas:::fit_congas(...)
  2. │ ├─base::order(model_selection_df %>% pull(!!model_selection))
  3. │ └─model_selection_df %>% pull(!!model_selection)
  4. ├─dplyr::pull(., !!model_selection)
  5. ├─dplyr:::pull.data.frame(., !!model_selection)
  6. │ └─tidyselect::vars_pull(names(.data), !!enquo(var))
  7. │   └─tidyselect:::pull_as_location2(...)
  8. │     ├─tidyselect:::with_subscript_errors(...)
  9. │     │ └─rlang::try_fetch(...)
 10. │     │   └─base::withCallingHandlers(...)
 11. │     └─vctrs::vec_as_location2(...)
 12. │       ├─vctrs:::result_get(...)
 13. │       └─vctrs:::vec_as_location2_result(...)
 14. │         ├─base::tryCatch(...)
 15. │         │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
 16. │         │   └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 17. │         │     └─base (local) doTryCatch(return(expr), name, parentenv, handler)
 18. │         └─vctrs::vec_as_location(i, n, names = names, arg = arg, call = call)
 19. └─vctrs (local) `<fn>`()
 20.   └─vctrs:::stop_subscript_oob(...)
 21.     └─vctrs:::stop_subscript(...)
 22.       └─rlang::abort(...)
Militeee commented 6 months ago

Hi @mmyers1,

This kind of error is generally due to a failure in the Python part, unfortunately, reticulate errors are not super informative, could you send us a reproducible example (of course after anonymizing it)? I assume your Python installation works fine as you can reproduce the example. Sorry for the inconvenience.

Cheers, S.

mmyers1 commented 6 months ago

I was able to reproduce the error with the first 500 cells from the public 10x multiome dataset "M_Jejunum_Chromium_Nuc_Isolation_vs_SaltyEZ_vs_ComplexTissueDP" -- I've attached my multiome_congas_object. congas_obj_demo.tar.gz

Let me know if you'd prefer other objects or raw counts files.

This is the code I ran to produce an error using this object:


filt = segments_selector_congas(multiome_congas_object)

# Set values for the model hyperparameters
k = c(1:4)
binom_limits = c(40,1000)
model = "BIC"
lr = 0.01
temperature = 20#10
steps = 5000
lambda = 0.5

# Estimate hyperparameters
hyperparams_filt <- auto_config_run(filt, k, 
                                    prior_cn=c(0.2, 0.6, 0.1, 0.05, 0.05),
                                    init_importance = 0.6, CUDA = FALSE,
                                    multiome=TRUE)
hyperparams_filt$binom_prior_limits = binom_limits
fit_filt <- Rcongas:::fit_congas(filt,
                                 K = k, 
                                 lambdas = lambda, 
                                 learning_rate = lr, 
                                 steps = steps,
                                 model_parameters = hyperparams_filt, 
                                 model_selection = model,
                                 latent_variables = "G",
                                 CUDA = FALSE,
                                 temperature = temperature, 
                                 same_mixing = TRUE, 
                                 threshold = 0.001)
mmyers1 commented 6 months ago

It could be a problem caused by running Rcongas::filter_missing_data() on the object in conjunction with the multiome model? I think this is the only step I've added beyond the demo code. Maybe something to do with removing a different number of cells in the different modalities?

caravagn commented 5 months ago

Cool @Militeee or @lucreziaPatruno can u try to run it yourself as well?

Militeee commented 5 months ago

Hi @mmyers1,

Sorry for the late reply. So I had a look at it and the actual problem is that filter_missing_data does not have a multiome mode, so it filters unequally the 2 modalities leaving the object with a different number of cells for ATAC and RNA. As a quick dirty fix while we solve the bug you can take the subset of cells you want and just run (assuming x is the congas object and cells_to_keep the cells you want ):

# Assuming you want only the intersection
cells_count <- x$input$dataset$cell %>% unique() %>% gsub("(-RNA)|(-ATAC)", "", .) %>% table
cells_root <- names(cells_count[cells_count>1])
cells_to_keep <- c(cells_root %>% paste0(., "-RNA"), cells_root %>% paste0(., "-ATAC"))

# Filter manually the dataset
x$input$dataset = x$input$dataset %>% filter(cell %in% cells_to_keep)
x$input$normalisation = x$input$normalisation %>% filter(cell %in% cells_to_keep)

@lucreziaPatruno I can push a simple fix but that would mean having a filter_missing_data with a specific argument, I think we are really making it hard to run the tool on multiome, maybe we should have just one argument during object construction and enforce the user to give the same names in both matrices. Then we can automatically sort it in all the internal functions. What do you think?

mmyers1 commented 5 months ago

Thanks for the tip, I'll try that workaround.

Not sure if you're asking me, but personally I would prefer it if init() had an option to indicate that the dataset is multiome and then this was handled behind-the-scenes :)

lucreziaPatruno commented 5 months ago

Thank you @mmyers1 for your feedback, and apologies for this issue. I agree @Militeee we need to implement an option in init to handle the multiome datasets, I will work on this fix.

lucreziaPatruno commented 5 months ago

Hi @mmyers1, I pushed the fix. We added the parameter multiome in the init function, and then everything else is handled behind the scenes.

BW Lucrezia