Closed HectorRDB closed 3 years ago
Hey Hector!
What do you expect to happen? That you get identical cell profiles as output? Or that you get the same number of cells from both WT and KO?
I assume you set the num_simulations
to 2 for the purpose of the reproducible example. If not, you should probably increase the number of simulations.
Do the sections below answer your question?
Robrecht
If you're expecting identical cell profiles: The simulations are still random. For instance, if I run the code above and visualise the model with plot_gold_mappings(model_comb)
, I get the following plot.
The first frame is the gold standard for both WT and KO, so there's no difference between them. The second and third frames are the two WT simulations, whereas the fourth and fifth are the KO simulations. With this specific seed, one WT simulation went to the end state on the left and one went to the right, whereas both KO simulations went to the right. If you increase the number of simulations, on average you will find the same kinds of cell profiles, but they will still be different, of course.
If you're simply expecting the same number of cells from KO as from WT, I've just pushed a commit to dyngen@devel that allows you to run the cell sampling experiment before combining the models. However, this will yield twice the number of cells specified by the num_cells
parameter.
Hey Robrecht, Thanks for the quick answer. Sorry, I should have been clearer on my first issue.
The KO experiment always has more cells. Indeed, here, each category of model_comb$simulations$meta$simulation_i
will have exactly 92
cells (if its' in WT) and 93
(if it's in KO). Moreover, this number seem to be a function of the census_interval
interval argument, not num_cells
.
This "extra" cell in KO always has a sim_time
variable of 0, i.e. it's not at all random: the second plot I showed is identical if I set n_simulations
to 100 or even 500. This means that the two datasets are not fully comparable.
I've done checks comparing the two distributions of sim_time (and time since I am unsure which of those two variables should reflect the truth) when multiplier is 1 and the distribution of p-values is very much skewed toward zero.
I might be misunderstanding the various outputs from the simulation but this all seems weird to me. Thanks for the help
Ok, I think I found the issue and it was on my end... I set the seed before all simulations, which explain why the time distribution was unmoving. Just changing the seed fixes the issue. This solves the imbalance issue. I still have the question of the number of cells. It seems to be a function of the census_interval interval argument, not num_cells. Thanks
Good to hear you solved the issue about the time distribution.
With regard to the number of cells: the number of samples that are being generated throughout the simulation is equal to:
total_simtime <- simtime_from_backbone(model$backbone)
sample_interval <- model$simulation_params$census_interval
num_simulations <- nrow(model$simulation_params$experiment_params)
num_samples <- floor(total_simtime / sample_interval) * num_simulations
If num_samples < model$numbers$num_cells
, then there will be only num_samples
cells, otherwise the full amount will be sampled. Maybe I should let dyngen show a warning when this occurs. If it does, simply decrease the census interval a bit.
Can you confirm that this was indeed the case?
Robrecht
Thanks!! So with the parameters from above, I get num_samples
is 138, which is higher than model$numbers$num_cells
(100).
I then generate the wild type (model_wt <- model_common %>% generate_cells()
) but this object has 178 cells, not 100 or even 138
nrow(model_wt$simulations$dimred)
[1] 178
Am I missing something?
Hey @HectorRDB,
Sorry for the delay, I finally got around to taking a look at this problem.
I think dyngen was actually generating fewer cells than expected (in the first code example, we expect model$simulations$meta
to have 200 rows since generate_experiment()
is being called twice).
I solved a bug fix and pushed the fix to devel. I'll publish the fix on CRAN asap.
Does this solve your problem?
Robrecht
Also, nrow(model_wt$simulations$dimred)
should be much greater than num_cells
(ideally, >×10). If it isn't, then the census_interval
needs to be decreased or the number of simulations should be increased (See ?simulation_default
).
Hi @rcannood I won't have time in the next few days to look at it but I'll keep you updated when I get around to it. Thanks for checking it out though. best
Hey @HectorRDB,
Do you still want to get back to this issue or shall I just close it?
Kind regards, Robrecht
You can close it, I took a quick look and it indeed seem to work. Thanks a lot for the help
:+1:
Hi, I have been following the knockout vignette from the devel branch and I noticed a weird behavior If I run, with no effect (multiplier is 1)
I don't get identical datasets:
I am missing something? Best