simulating a 2000cell*150000peak scATAC

Chen-Li-17 commented 1 year ago

It nearly spends a whole day to simulate, which is inconvenient.
I encountered an error like this when I try to simulate a 2000cell*150000peak scATAC. Could you tell me how to fix it?
```
Input Data Construction Start
```

Warning message in asMethod(object): “sparse->dense coercion: allocating vector of size 2.4 GiB” Input Data Construction End

Start Marginal Fitting

Warning message in mclapply(seq_len(n), do_one, mc.preschedule = mc.preschedule, : “scheduled cores 1, 2 did not deliver results, all values of the jobs will be affected”

Error in names(answer) <- dots[[1L]]: attempt to set an attribute on NULL Traceback:

scdesign3(sce = sce_seurat, assay_use = "counts", celltype = "cell_type", . pseudotime = NULL, spatial = NULL, other_covariates = NULL, . mu_formula = "cell_type", sigma_formula = "1", family_use = "zip", . n_cores = 2, usebam = FALSE, corr_formula = "cell_type", . copula = "gaussian", DT = TRUE, pseudo_obs = FALSE, return_model = FALSE, . nonzerovar = FALSE)
fit_marginal(mu_formula = mu_formula, sigma_formula = sigma_formula, . n_cores = n_cores, data = input_data, family_use = family_use, . usebam = usebam, parallelization = parallelization, BPPARAM = BPPARAM)
suppressMessages(paraFunc(fit_model_func, gene = feature_names, . family_gene = family_use, mc.cores = n_cores, MoreArgs = list(dat_use = dat_cov, . mgcv_formula = mgcv_formula, mu_formula = mu_formula, . sigma_formula = sigma_formula, predictor = predictor, . count_mat = count_mat), SIMPLIFY = FALSE))
withCallingHandlers(expr, message = function(c) if (inherits(c, . classes)) tryInvokeRestart("muffleMessage"))
paraFunc(fit_model_func, gene = feature_names, family_gene = family_use, . mc.cores = n_cores, MoreArgs = list(dat_use = dat_cov, mgcv_formula = mgcv_formula, . mu_formula = mu_formula, sigma_formula = sigma_formula, . predictor = predictor, count_mat = count_mat), SIMPLIFY = FALSE)

SONGDONGYUAN1994 commented 1 year ago

Hi Lee, Thanks for your interest in our work! For the time complexity, it depends on your data dimension, model setting, and the number of cores. If your data is very high-dimensional, using more cores (e.g., > 10) can reduce the time dramatically.

For your error here, we seem to have some issues in the marginal regression model fitting. Please check two things:

Do you have any features with all 0?
If you set family_use = 'poisson', does it work? ZIP is usually less stable.

Without a reproducible case, sorry that I cannot say much about the reason. You can email me: dongyuansong@ucla.edu, and we can set a virtual meeting if it helps.

Best, Dongyuan

Chen-Li-17 commented 1 year ago

Dear Dongyuan, Thank you for the reply, I'll rerun my code on your advice and give you feedback soon.

SONGDONGYUAN1994 / scDesign3

simulating a 2000cell*150000peak scATAC #3