integrate technical noise functions with data generation and model calibration

augeorge commented 2 weeks ago

Technical noise should be integrated with the "clean" data generation process.

Data generation process:

Currently there are two approaches to generate data:

structural causal models --> (non)linear equations representing casual relationships between steady state populations
dynamic (kinetic) models --> (non)linear differential equations representing the change in populations over time based on biochemical kinetic laws.

Kinetic models are more general in that they include causal relationships between populations (as given by rate equations) and can be solved for the steady-state case. However, they are more computationally expensive, require additional parameters that govern the kinetic rates, can be challenging numerically, and are not specific to the data we will use which is at steady state.

Therefore I suggest prioritizing development efforts on the structural causal models.

Integration of technical noise with structural causal models

Consider linear structural causal models of the form: $Y= \sum{\beta X} + \epsilon$, where $Y$ is causally linked with the sum of $X$ by a scaling factor $\beta$ plus a bias constant $\epsilon$.

suppose the observed value of Y is $Y{obs} = f(Y)$, and that the observed value of X is $X{obs} = f(X)$. The observed structural causal model is: $Y_{obs} = f(\beta^T X + \epsilon)$.

augeorge commented 2 weeks ago

outliers:

$$ X_{outlier} \sim X \cdot f^O \cdot I^O + (1 - I^O) \cdot X \ $$ $$ I^O \sim \text{Bernoulli}(\pi^O) \ $$ $$ f^O \sim \text{LogNormal}(\mu^O, \sigma^O) \ $$

augeorge commented 2 weeks ago

blocked by #16

CRISPR-CARB / nocap

integrate technical noise functions with data generation and model calibration #15

Data generation process:

Integration of technical noise with structural causal models