Modularisation of base.r preprocessing

payoto commented 4 years ago

This branch moves some of the I/O and argument parsing functionalities of base.r to functions in separate files following the question in issue #47 .

Code changes

Move argument parsing to r-utils/arg-parse.r into function base_arg_parse() the argument parsing code is unchanged and the behaviour either.
Move reading of covariates (from data/interventions.csv) into function covariates_read(file_name).
- Modify the processing of dates from the covariates to use loops (improves readability)
- Behaviour is unchanged as shown by the test in the test file.
Update plot-3-panel to use the function for covariate processing. ⚠️ This leads to a change of behaviour in the plots.

Behaviour change

Items 1 and 2 above lead to no changes in code behaviour.

Item 3 changes the right most image generated by plot-3-panel to match what is used in the calculation. The only change for the countries active so far is for Spain. Self isolation and lockdown are now applied at the same time rather thant lockdown before safe isolation.

Before change: Spain_three_pannel_base-1637179-stanfit Rdata

After change (this second run is a debug run not a full run like the one above): Spain_three_pannel_base-402127-stanfit Rdata

It seems from what I understand that the underlying info calculated by the model corresponds to the new version not the current one.

s-mishra commented 4 years ago

I see differences in results for your changes and neither the pushed code is correct. It doesn't have a few files. Any PR that changes our results won't be merged.

payoto commented 4 years ago

I've seen the CI fail, and will take this out of draft only when they're fixed.

Are the differences you refer to the ones I indicate? I'm very interested if they're not, I'm attempting to extend your study to a more granular geographical scale and would be remiss if I introduced additional bugs.

To clarify, is it on purpose that the covariate dates you calculate the model with, and the ones you plot are different?

s-mishra commented 4 years ago

They are not different as such, it is handling of a peculiar situation. For example, if you see a few of those interventions have happened after a lockdown in a country, which is weird.To take an example, Once a country goes to lockdown on 12th March it doesn't matter that now you shut schools on 13th March as they are already shut because of lockdown.

payoto commented 4 years ago

Yes, I get that and think its reasonable, that's why you correct them and set the application of an NPI to be at the latest the time of lockdown. However while in the real world "it doesn't make sense" mathematically in the model, my understanding is that those effects are separate?

If that is correct it implies that the plot of Rt describing the model and the model do not match (at least in a minor way). That is either a bug, or it needs justification. That inconsistency is on the website: https://imperialcollegelondon.github.io/covid19estimates/#/details/Spain

gustavdelius commented 4 years ago

Thank you Alexandre. That is very nice. It is a good thing to rewrite the research code cleanly because it helps with the process of checking the research and, as you are doing, building on it.

I also think your way of displaying the R0 plot, where you are using the dates actually used in the model, is clearer.

I think it would be beneficial to continue to streamline the code further. One thing for example that is a nuisance is the size of the ...-stanfit.Rdata file that contains a lot of the information in duplicate. For example no need to contain the variables prediction, estimated.deaths and estimated.deaths.cf because they are already in out. Also no need to include out because that is quickly recreated with rstan::extract(fit). Because during experimentation I am creating a lot of these files and have to often transfer them between machines, a smaller file would be more convenient.

I have merged your pull request into my fork and will make the above change there and then create a pull request to your fork in case you are interested. That is the nice thing about GitHub forks: we do not have to rely on a central repository.

payoto commented 4 years ago

Your observation about forks is a very good one, what is in your opinion of the best way to keep community features available and visible if the central repository does not see a use for them?

s-mishra commented 4 years ago

Just to say @payoto it is not that we do not appreciate your work. On the other side we love it. Just that speed of the new development work happening is too much for us to make sure we can do things fast enough for Pull Request.

gustavdelius commented 4 years ago

@payoto You are raising an interesting question. This GitHub repository plays a central role and can help in keeping developments made by others in their branches visible to the community. I propose that people should be encouraged to create issues in this repository describing their work, with links to the branches in their forks holding the code. These issues should be given a label like "community", to make it clear that they are not issues affecting this project but contributions by the community. @s-mishra , do you think you would be happy with your issue tracker being abused for this purpose and for this to be advertised in contributing.md?

payoto commented 4 years ago

@s-mishra, I meant it with no animosity: necessarily your priorities are different to ours and community edits can't put even more strain on your team, this is first of all an outlet for your code. I completely understand that you have a centralised work process that can't be instantly change to support community work.

I think a solution like @gustavdelius suggests is a good idea. If PRs are not gonna be used as such community suggestions could go as hanging PRs here with a specific "community" flag otherwise issues, or otherwise a central community fork that this repo can also point to, but then finding information starts looking like a treasure hunt.

s-mishra commented 4 years ago

Hi @payoto and @gustavdelius happy to have this issue tracker open and will certainly point people here if they come about modularisation and hopefully we will merge this soon enough.

payoto commented 4 years ago

I'm closing this as a lot of things have changed since this suggestion

ImperialCollegeLondon / covid19model

Modularisation of base.r preprocessing #51

Code changes

Behaviour change