dv516 / RNA-transcription-modelling-and-DS

Model Implementation and Design Space Construction to support 'Quality by Design modelling for rapid RNA vaccine production against emerging infectious diseases'
4 stars 3 forks source link

RNA-transcription-modelling-and-DS

Model Implementation and Design Space Construction to support 'Quality by Design modelling for rapid RNA vaccine production against emerging infectious diseases'

This README document explains the function of each item within this repository and how the items are linked to each other

Dependencies

Documentation

Function scripts

Main scripts

It is recommended to run the scripts in the following order to replicate the workflow followed by the authors, but each of these scripts can be run independently. The scripts that are marked as optional do not contain figures that directly support the manuscript.

  1. curve_fitting_exploration_prediction.py: Uses the scipy.optimize.curve_fit() tool to estimate the model parameters. It gives the optimal mdoel parameters, their standard deviation and correlations. It then plots the simulated versus experimental results for each data sample, produces a prediction error plot, and shows the explored input space, namely the data samples in the process parameter space.

  2. optimal_parameters_exploration.py (optional): Explores in more depth the dependency of the RNA yield on the process parameters, namely initial Magnesium, T7RNAP and NTP concentrations at the curve_fitting's optimal model parameter values.

  3. cross_validation.py (optional): Does 10-fold cross validation. It randomly splits the experimental samples into 10 folds. It performs parameter estimation (curve_fit()) 10 times using 9 folds as the training set and leaving the remaining set to test the fit. It then plots the simulated versus experimental results for each data sample as well as the prediction error plot

  4. Cost_Yield_Plots.py: Fixes the Mg concentration and produces a 3D figure showing the cost per gram of RNA as a function of T7RNAP and Mg concentration to find the cost-optimal operating point and its associated cost. This graph is colour-coded according to the abolute RNA yield

  5. 3D_deterministic_DS.py: Creates a 3D grid of the input process parameter space consisting of initial Mg, NTP and T7RNAP concentrations and shows the design space, the grid points that meet a certain yield threshold at the previously found optimal model parameters. This figure is colour-coded according to the absolute RNA yield.

  6. 2D_probabilistic_DS.py: Fixes T7RNAP and performs 50 Monte Carlo simulations where model parameters are sampled with 20% standard deviation around their experimental optimum to get the probability of a point on the Mg-NTP grid reaching a required yield threshold.

Excel files

To ignore