using dynverse/dyngen to simulate the synthetic datasets

rcannood commented 5 years ago

Hey all!

I read your bioRxiv on "Benchmarking algorithms for gene regulatory network inferencefrom single-cell transcriptomic data" and was quite intrigued. I've been toying around with various forms of NI myself, and think that a benchmark for scNI methods is highly warranted.

I read that the synthetic networks are being generated by taking the module networks from dynverse/dyngen, converting them to ODEs using BoolODE and then running GeneNetWeaver. I was wondering why GNW is being used at all, since dyngen is also able to perform all of these steps. One of the benefits of dyngen is that it uses Gillespie's SSA instead of ODE's. SSA simulations keep track of the number of molecules in your cell (mrna, proteins), and simulates at each step which reaction takes place (e.g. transcribe a new mrna). This way it doesn't need need to generate random noise at each time step in order to simulate stochasticity. Instead, the stochasticity comes from which reactions are being triggered at each of the time steps.

Have you tried dyngen instead of GNW in order to perform the simulations?

What are your thoughts on this? Robrecht

tmmurali commented 5 years ago

We did not consider dyngen until now. We will try it out and let you know what we find. SSA is a good alternative to stochastic DEs. Thanks for the suggestion!

adyprat commented 5 years ago

Hi Robercht, Thank you for your interest in our manuscript. There seems to be some confusion about running GNW, because BoolODE does not run GNW internally. The ODEs generated while running BoolODE are converted to SDE, and it then runs the Euler-Maruyama scheme for the numerical integration of these SDEs as described in our paper (in page 23), which is similar to the formulation described in dynverse/dyngen (page 62, under ''Simulation of gene regulatory systems using thermodynamic models'' Supplementary File 1. Is this what you were referring to when you said GNW? Because I'm a little confused about where the Gillespie's Stochastic Simulation Algorithm (SSA) is described for dyngen. I see that the News.md page prominently features it, but could you point me to where the SSA formulation is described in more detail so I can look more into this? In the meantime, I will check if there is any difference in the performance of the GRN inference algorithms on data simulated using dyngen vs. BoolODE on a given example network. I'll keep you posted. Thanks Aditya

rcannood commented 5 years ago

Hey Aditya,

In the benchmark paper we were indeed still using Euler-Maramuya because GillespieSSA2 was not optimised enough to run large SSA simulations quickly. The manuscript accompanying GillespieSSA is a good introduction to SSA simulation, as well as one of the many vignettes available for the GillespieSSA2 package (e.g. this one).

Brief explanation of dyngen

dyngen first generates a GRN from a certain backbone (e.g. a bifurcating backbone).

It runs many SSA simulations of the GRN. The SSA will keep track of the exact number of molecules present at a given time point and performs reactions based on the expression levels (e.g. rna transcription). A dimred of the individual simulations at each of the time points looks like this: .

A gold simulation with Euler-Maramuya with 0 noise is also run to define the expression levels of the main transcription factors during different 'branches' of the trajectory.

The gold simulations are used to determine what the state of each of the cells are:

You end up with the gold standard GRN, the gold standard trajectory, and for each cell you have an expression profile and the position of the cell in the trajectory.

adyprat commented 5 years ago

Hi Robrecht, Thanks for the detailed explanation and pointers for SSA. I'll try this and get back to you.

Murali-group / Beeline

using dynverse/dyngen to simulate the synthetic datasets #29

Brief explanation of dyngen