Inconsistent results between seeds #135

Open michaelplynch opened 1 month ago

michaelplynch commented 1 month ago

Hi Stefano, Flagging some interesting behaviour I've come across with the package.

Problem: The 1d credible intervals plot is showing different results in terms of rank, credible interval and FDR significance when run with different seeds. I see the "EXPERIMENTAL ALGORITHM" message, is this a byproduct? I'm working off sccomp v1.8 from the most recent Bioconductor release. I also notice on the Bioconductor vignette that for the boxplot all cell types are coloured as significant for c_typecancer, which looks inconsistent with the credible interval plot. Is this also a bug or have I misinterpreted the plot?

Dataset: Working with a SingleCellExperiment object, happy to share privately if you want to replicate. Code: `set.seed(3) sccomp_result = sce_pm |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, cores = 4, max_sampling_iterations = 200000 ) |> sccomp_remove_outliers(cores = 1) |> # Optional sccomp_test()


plots = sccomp_result |> plot() plots$boxplot plots$credible_intervals_1D

set.seed(1) sccomp_result = sce_pm |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, cores = 4, max_sampling_iterations = 200000 ) |> sccomp_remove_outliers(cores = 1) |> # Optional sccomp_test()


plots = sccomp_result |> plot() plots$boxplot plots$credible_intervals_1D` Output: image image

stemangiola commented 1 month ago

Yes please share sce_pm at


stemangiola commented 1 month ago

I also notice on the Bioconductor vignette that for the boxplot all cell types are coloured as significant for c_typecancer, which looks inconsistent with the credible interval plot

I'll have a look tomorrow morning.

stemangiola commented 4 weeks ago

Hello, please try the branch

I have allowed sampling iterations > 1000 for vb

Also I implemented the more modern backend with the fast and very reliable pathfinder, please try it out as well

stemangiola commented 4 weeks ago

VB with custom number of draws


full-bayes HMC

sccomp_result =
  sccomp_test_sce |>
    formula_composition = ~ type,
    .sample = sample_id,
    .cell_group = celltype_num,
    bimodal_mean_variability_association = TRUE,
    variational_inference = F
  ) |>
  sccomp_remove_outliers(variational_inference = F) |> # Optional


plots = sccomp_result |> plot()

sccomp_result =
  sccomp_test_sce |>
    formula_composition = ~ type,
    .sample = sample_id,
    .cell_group = celltype_num,
    bimodal_mean_variability_association = TRUE,
    variational_inference = F
  ) |>
  sccomp_remove_outliers(variational_inference = F) |> # Optional


plots2 = sccomp_result |> plot()

plots$credible_intervals_1D + plots2$credible_intervals_1D

Now the new cmdstanr backend (updated now, please reinstall), using the fast pathfinder in this case (HMC is always available)

sccomp_result =
    sccomp_test_sce |>
        formula_composition = ~ type,
        .sample = sample_id,
        .cell_group = celltype_num,
        bimodal_mean_variability_association = TRUE
    ) |>
    sccomp_remove_outliers() |> # Optional


plots = sccomp_result |> plot()

sccomp_result =
    sccomp_test_sce |>
        formula_composition = ~ type,
        .sample = sample_id,
        .cell_group = celltype_num,
        bimodal_mean_variability_association = TRUE
    ) |>
    sccomp_remove_outliers() |> # Optional


plots2 = sccomp_result |> plot()

plots$credible_intervals_1D + plots2$credible_intervals_1D
stemangiola commented 4 weeks ago

Having tested this a little, I would say, use

variational_bayes = FALSE,

it is the gold standard and gives the most consistent results.

I will investigate further the variational strategy.

michaelplynch commented 4 weeks ago

Hi Stefano, Can you confirm that it's the cmdstanr branch I should be reinstalling on? On Windows I get an install error similar to your GHA. On Linux oddly enough it will install but errors with "Further attempt with Variational Bayes: Error: Model not compiled. Try running the compile() method first."

stemangiola commented 4 weeks ago

You can

michaelplynch commented 3 weeks ago

Thanks Stefano.

Assuming you mean variational_inference=F (rather than variational_bayes=F). variational_bayes=F returns an unused argument error.

For master branch, variational_inference=F looks more consistent. For cmdstanr branch, inference_method="pathfinder" also looks more consistent.

Other feedback: I'm not sure was the pathfinder implementation much faster than HMC on the master branch in this case but I haven't formally tested this. For cmdstanr branch with HMC e.g.: sccomp_result = sccomp_test_sce |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, inference_method = "HMC" ) |> sccomp_remove_outliers() |> # Optional sccomp_test() and sccomp_result = sccomp_test_sce |> sccomp_estimate( formula_composition = ~ type, .sample = sample_id, .cell_group = celltype_num, bimodal_mean_variability_association = TRUE, inference_method = "HMC", variational_inference = FALSE ) |> sccomp_remove_outliers() |> # Optional sccomp_test() return errors Error in vb_iterative(mod, output_samples = output_samples, iter = 10000, : sccomp says: variational Bayes did not converge after 5 attempts. Please use variational_inference = FALSE for a HMC fitting. and Error in fit2$num_chains() : attempt to apply non-function, at least with this dataset.