IDAS-Durham / JUNE

June is a framework for agent based modelling in an epidemiological and geographical context.
GNU General Public License v3.0
41 stars 11 forks source link

Running a complete workflow from start to finish via a python script and ipynb notebook #265

Closed valeriupredoi closed 4 years ago

valeriupredoi commented 4 years ago

Hello guys, first off let me just say awesome job getting involved in the Covid data analysis effort! I have a couple questions and suggestions for yous (apols if the suggestions are something that you've written somewhere but I couldn't find it):

Questions

Possible Suggestions (yeah I know, my first issue here and am already suggesting :grin: )

Finally, dumb question, I got this plot out and I am wondering - how can the R number be 50 around March? :grin: Figure_3

valeriupredoi commented 4 years ago

attn: @grenville @bnlawrence @sadielbartholomew

arnauqb commented 4 years ago

Hi Valeriu,

thank you very much for having a look at the code.

  • is there a location for the (wip) documentation?

No, we have been quite bad at keeping the documentation up to date. We could generate it from the functions' docstrings but there are some issues about publishing in github pages and owning a free Gtihub teams account..

  • is there a Python notebook that runs the example flow (note that we could not use the quickstart.ipynb from master, see below)?

Somehow the quickstart notebook broke with a recent PR, I'll fix it and updated it to the main branch ASAP, we try to keep it functional with the latest features as much as we can.

The install scripts are very useful, especially now that we are running in different clusters it makes it very portable, thanks! I'll definetely steal them and we can include them in master as a way to run the code.

The R plot is always funny, it's not really the usual R0 value that people quote around. It is difficult to calculate it as it is usually defined as "the average number of people a person infects in the duration of their infection", but this defintion is tricky to implement as at any point of the simulation some people will be at different timings of their infection, so you can only get a real value for the ones who already recovered, but that gives you a delayed information about how the disease is spreading... What we do is we assign a score to each person that infects someone, and we quote the mean of these scores for all the infected people (that will give you a strictly lower R0 value, since they have not finished their infection time).

The high peak you see at the beginning is an artifact of how we do the seeding. We start with 50 people who have just been infected, and they all reach their infectivity peak after ~2-3 days, so that creates a boom of infections around three days of starting the simulation. We are planning on randomizing the times at which people start their infections in the seeding, but we don't think that will have a strong effect on the final results.

arnauqb commented 4 years ago

Ok, Notebook should be fixed now. You can also run the simulation by using the scripts scripts/create_world.py and scripts/run_simulation.py. The first one creates a world named tests.hdf5 and the second one loads it and runs it.

valeriupredoi commented 4 years ago

cheers muchly @arnauqb :beer: And thanks for the detailed R explanation too - I developed a method to evaluate R_0 that is consistently underestimating it so maybe we can join forces :grin:

valeriupredoi commented 4 years ago

BTW please close the issue whenever you reckon it's all good :beer: