aezarebski / derp-simulation

Code to simulate phylogenetic trees that can be used to train neural networks
https://aezarebski.github.io/derp/index.html
MIT License
0 stars 1 forks source link

Current database only records the prevalence and cumulative infections at the present #11

Closed aezarebski closed 1 month ago

aezarebski commented 2 months ago

https://github.com/aezarebski/derp-simulation/blob/a4ac52db4f24c5657b10c00f1293b22430e0b686/main.py#L436

The simulated dataset only includes measurements of the prevalence and cumulative infections at the present. This limits our ability to make predictive models that estimate these values through time using this dataset.

Suggested solutions

  1. storing the whole piece-wise constant function in the database might be a bit much and leaves us with complex objects to work with downstream.
  2. storing the piece-wise constant function at a random selection of times could be a sensible approach, it would allow us to do away with storing the whole representation of the R0 function and we would just move to a flat table for the values at different times. This is the simplest but feels a bit arbitrary.
  3. we could store evaluate the relevant functions on a fine mesh and then do linear interpolation to get approximate values at any time. This is the most elegant but has an approximation error and may be overkill.

Option 2 seems the best for now but would involve a slight reorganisation of the database being produced. It would definitely simplify downstream usage though.

thomaswilliams23 commented 2 months ago

https://github.com/aezarebski/derp-simulation/blob/a4ac52db4f24c5657b10c00f1293b22430e0b686/main.py#L336

Need to pass times through here.

aezarebski commented 2 months ago

The database should have the following structure (eventually)

/Simulations/
    /Simulation_001/
        TemporalMeasurements (dataset with columns MeasurementTime, Prevalence, ReproductionNumber, Etc)
        PickledTree (dataset)
        SimulationXMLConfig (attribute storing the XML as a string)
    PresentTime (attribute storing the time of the last sample)
    TreeHeight (attribute storing the tree height)

Initial work on this issue should just focus on getting the temporal measurements included into the current database. We can reorganise it into the structure above later.