dynverse / dyngen

Simulating single-cell data using gene regulatory networks 📠
https://dyngen.dynverse.org
Other
73 stars 6 forks source link

`dynwrap` should be an `Imports` dependency, not `Suggests` #28

Closed scottgigante closed 3 years ago

scottgigante commented 3 years ago

Example code:

library(dyngen)
init <- initialise_model(
  backbone=backbone_bifurcating(),
  num_cells=50,
  num_tfs=50,
  num_targets=20,
  num_hks=10,
  simulation_params=simulation_default(
    census_interval=as.double(10),
    kinetics_noise_function = kinetics_noise_simple(mean=1, sd=0.005),
    ssa_algorithm = ssa_etl(tau=300/3600),
    compute_cellwise_grn=FALSE,
    compute_rna_velocity=FALSE),
  num_cores = 4,
  download_cache_dir=NULL,
  verbose=TRUE
)
out <- generate_dataset(init)

Result:

Generating TF network
Sampling feature network from real network
trying URL 'https://github.com/dynverse/dyngen/raw/data_files/regulatorycircuits_01_neurons_fetal_brain.rds'
Content type 'application/octet-stream' length 1340872 bytes (1.3 MB)
downloaded 1.3 MB

Generating kinetics for 80 features
Generating formulae
Generating gold standard mod changes
Precompiling reactions for gold standard
Running gold simulations
  |==================================================| 100% elapsed=02s, remaining~00s
Precompiling reactions for simulations
Running 32 simulations
Mapping simulations to gold standard
Performing dimred
Simulating experiment
trying URL 'https://github.com/dynverse/dyngen/raw/data_files/zenodo_1443566_real_gold_germline-human-both_guo.rds'
Content type 'application/octet-stream' length 2039740 bytes (1.9 MB)
downloaded 1.9 MB

Wrapping dataset
Loading required namespace: dynwrap
Failed with error:  ‘there is no package called ‘dynwrap’’
In addition: Warning messages:
1: UNRELIABLE VALUE: Future (‘<none>’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: In .generate_cells_predict_state(model) :
  Simulation does not contain all gold standard edges. This simulation likely suffers from bad kinetics; choose a different seed and rerun.
Error in loadNamespace(name) : there is no package called ‘dynwrap’
rcannood commented 3 years ago

Hey Scott!

Thanks for your input! As is, dynwrap should indeed be an import instead of a suggest.

The reason that I'm not immediately agreeing is because I'm considering making dynwrap less mandatory to use dyngen in order to reduce the dependency stack. With the upcoming release, you can convert the output of dyngen to a dyno, anndata, Seurat or SCE object.

However, I didn't think about the fact that wrap_dataset() always create one of these objects. How about I create a parameter output_format = "dyno" (for compatibility purposes) which can be set to "sce", "seurat", "anndata" or "none".

This wouldn't solve your problem as you would still need to install dynwrap, SingleCellExperiment, Seurat or anndata in order to get the data you want, but maybe you already have one of those packages installed?

Robrecht

rcannood commented 3 years ago

↑ see commit above :)

scottgigante commented 3 years ago

Amazing! This is a much better solution, thanks! To be clear, none of these (except "none") would work without installing an additional dependency?

rcannood commented 3 years ago

Exactly. I just pushed a solution that should work for you. Either you install dynwrap/anndata/SingleCellExperiment and get an object that works nicely with other packages by using wrap_dataset(model, format = "dyno") or as_dyno(model) or any of the other format functions. Alternatively, you can use wrap_dataset(model, format = "list") or as_list(model) and get a simple list which works exactly the same as a dyno object, but without having to install dynwrap.

The only object missing from the output created by as_list(model) as opposed to as_dyno(model) is dataset$milestone_network, I'll add that in later. (dataset$progressions is available, though.)

I suppose this solves your problem? :) Once I'm happy with the solution, I'll merge it to devel and publish it on CRAN with the next release.

scottgigante commented 3 years ago

This is excellent, thank you! If you're looking for suggestions on API, I would suggest if the user doesn't specify which to use, it chooses based on what is available (to ensure the default behaviour doesn't throw an error regardless of installation preferences.)

rcannood commented 3 years ago

Thanks for the suggestion! I'm afraid I can't change the behaviour of a function based on the host system, as it could lead to some very unexpected behaviour (e.g. user mainly uses SingleCellExperiment but at some point also installs Seurat, which format should be used).

However, I can set the default to 'list'. It is the least useful of all the formats, but the one that will certainly work on whatever system you're running it.

Is this okay for you?

scottgigante commented 3 years ago

You're right that that seems like the lowest risk of unexpected behaviour, but would be a shame to make the default behaviour less useful than the alternatives -- perhaps a default of output_format=NA, and if NA, issue a warning and set output_format <- 'list' -- this way the user knows that they should specify?

rcannood commented 3 years ago

Sorry Scott, but I'm keeping it as is. If you want an SCE, just use as_sce().

Good news through, thanks to CRAN I can still include these changes in release 1.0.0! :)