caravagnalab / rcongas

rcongas
GNU General Public License v3.0
7 stars 1 forks source link

Error in function fit_congas #31

Closed Larrycpan closed 1 year ago

Larrycpan commented 1 year ago

Really appreciate your contributions. I met the error when executing the function Rcongas:::fit_congas as follows:

── Fit with k = 4 and lambda = 0.5. 
── Fit with k = 2 and lambda = 0.5. 
── Fit with k = 1 and lambda = 0.5. 
── Fit with k = 3 and lambda = 0.5. 
No protocol specified
No protocol specified
No protocol specified
No protocol specified
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:178.)
sys:1: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:178.)
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’ 
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’ 
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’ 
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’ 
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’ 
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’ 
Warning messages:
1: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘Rcongas’ 
2: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’ 
[easypar] 4/4 computations returned errors and will be removed.

── (R)CONGAS+ fits completed in 24.4s. ──

Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/vctrs_error_subscript_oob>
Error in `pull()`:
! Can't extract columns that don't exist.
✖ Column `BIC` doesn't exist.
---
Backtrace:
    ▆
 1. ├─Rcongas:::fit_congas(...)
 2. │ ├─base::order(model_selection_df %>% pull(!!model_selection))
 3. │ └─model_selection_df %>% pull(!!model_selection)
 4. ├─dplyr::pull(., !!model_selection)
 5. └─dplyr:::pull.data.frame(., !!model_selection)
Run rlang::last_trace(drop = FALSE) to see 17 hidden frames.

I don't know how to deal with this. Look forward to your reply.

Militeee commented 1 year ago

Hi @Larrycpan,

Basically the fits are failing for some reason, we first have to understand if it's a data or an environment problem. Can you share a minimum example to reproduce the error?

Cheers, S.

Larrycpan commented 1 year ago

Hi @Larrycpan,

Basically the fits are failing for some reason, we first have to understand if it's a data or an environment problem. Can you share a minimum example to reproduce the error?

Cheers, S.

Thanks for your quick reply. I have uploaded my example data and the analysis code for your check and help.

x1 <- readRDS("x1.rds")
atac_features <- readRDS("atac_features.rds")
names(atac_features) <- c("chr", "from", "to")
atac <- Rcongas::create_congas_tibble(counts = x1, 
                                     modality = 'ATAC', 
                                     save_dir = NULL, 
                                     features = atac_features)
atac <- atac %>% mutate(cell = paste0(cell, '-ATAC'))
atac$value <- as.integer(atac$value)
# Compute normalization factors
norm_atac <- Rcongas:::auto_normalisation_factor(atac) %>% mutate(modality = 'ATAC')

data(example_segments)
# Remove these chromosomes
segments <- example_segments %>% dplyr::filter(chr != 'chrY')

##
x2 = init(
  rna = NULL,
  atac = atac,
  segmentation = segments,
  atac_normalisation_factors = norm_atac,
  atac_likelihood = "NB",
  description = "atac_model"
  )

###############################################################################
k = c(1:4)

binom_limits = c(40,1000)
model = "BIC"

lr = 0.01
temperature = 20

steps = 5000
lambda = 0.5

# Estimate hyperparameters
hyperparams_filt <- auto_config_run(x2, k, 
                                    prior_cn=c(0.2, 0.6, 0.1, 0.05, 0.05),
                                    init_importance = 0.6, CUDA = FALSE)
# Run
hyperparams_filt$binom_prior_limits = binom_limits

fit_filt <- Rcongas:::fit_congas(x2,
                                 K = k, 
                                 lambdas = lambda, 
                                 learning_rate = lr, 
                                 steps = steps,
                                 model_parameters = hyperparams_filt,
                                 latent_variables = "B",
                                 model_selection = "BIC",
                                 # CUDA = FALSE,
                                 temperature = temperature,
                                 parallel = T,
                                 same_mixing = TRUE, 
                                 threshold = 0.001)

The data is provided in the folder https://github.com/Larrycpan/test *.rds

Militeee commented 1 year ago

Ok so I think I made it work:

  1. I think your data is simulated or a subset of some bigger dataset (it only has chr1). Anyway, I suggest you to use hg38_arms as segments in case you don't have them a-priori, as example_segments are relative to the example data we have in the package and are not general (for instance they don't have chr1 and thus you end up with an empty object). Also, mind that automatic library size normalization will probably not work if you have just one segment
    data(hg38_arms)
    # Remove these chromosomes
    segments <- hg38_arms %>% dplyr::filter(chr != 'chrY')
  2. Unfortunately some arguments are deprecated and we will remove them before the manuscript is out, for now, I suggest you run with parallel=FALSE and latent_variable="G". Don't worry it will run in parallel anyway, just it will directly auto-adjust using pytorch.
    fit_filt <- Rcongas:::fit_congas(x2,
                                 K = k, 
                                 lambdas = lambda, 
                                 learning_rate = lr, 
                                 steps = steps,
                                 model_parameters = hyperparams_filt,
                                 latent_variables = "G",
                                 model_selection = "BIC",
                                 # CUDA = FALSE,
                                 temperature = temperature,
                                 parallel = F,
                                 same_mixing = TRUE, 
                                 threshold = 0.001)
  3. Last point is a thank you as we spotted a super nasty bug with scikit-learn thanks to this issue, which was the main reason the fit procedure failed. Just re-installing the package should fix it. If you have other problems, feel free to write here.

S.

Larrycpan commented 1 year ago

Ok so I think I made it work:

1. I think your data is simulated or a subset of some bigger dataset (it only has chr1). Anyway, I suggest you to use `hg38_arms` as segments in case you don't have them a-priori, as `example_segments` are relative to the example data we have in the package and are not general (for instance they don't have chr1 and thus you end up with an empty object). Also, mind that automatic library size normalization will probably not work if you have just one segment
data(hg38_arms)
# Remove these chromosomes
segments <- hg38_arms %>% dplyr::filter(chr != 'chrY')
2. Unfortunately some arguments are deprecated and we will remove them before the manuscript is out, for now, I suggest you run with `parallel=FALSE` and `latent_variable="G"`. Don't worry it will run in parallel anyway, just it will directly auto-adjust using pytorch.
fit_filt <- Rcongas:::fit_congas(x2,
                                K = k, 
                                lambdas = lambda, 
                                learning_rate = lr, 
                                steps = steps,
                                model_parameters = hyperparams_filt,
                                latent_variables = "G",
                                model_selection = "BIC",
                                # CUDA = FALSE,
                                temperature = temperature,
                                parallel = F,
                                same_mixing = TRUE, 
                                threshold = 0.001)
3. Last point is a thank you as we spotted a super nasty bug with scikit-learn thanks to this issue, which was the main reason the fit procedure failed. Just re-installing the package should fix it. If you have other problems, feel free to write here.

S.

We have solved the problem and really thank for your help!