covid19india / covid19india-react

Tracking the impact of COVID-19 in India
https://www.covid19india.org
MIT License
6.86k stars 3.41k forks source link

Rt tracking by state #1084

Open seabbs opened 4 years ago

seabbs commented 4 years ago

Is your feature request related to a problem? Please describe.

Showing just case counts is informative but doesn't get at the root of the issue.

Describe the solution you'd like It would be nice to see regional Rt estimates. We have an approach for doing this by region (https://github.com/epiforecasts/covid-regional/blob/b5cd746537dfc1f2cb15bef233342c4bfc1987d4/united-kingdom/update_nowcasts.R; https://github.com/epiforecasts/EpiNow/blob/master/R/regional_rt_pipeline.R) with an example here (https://epiforecasts.io/covid/posts/national/united-kingdom/)

Describe alternatives you've considered

Could roll your own implementation but there are some tricky details it is important to consider and naive implementations can have quite a few issues

Additional context

I am the lead author on the linked project. We are struggling with capacity at the moment so support may be an issue but I would be happy to review any dev work using our tooling.

Great work on the site - very clear, very useful.

seabbs commented 4 years ago

Our national estimate may be of interest (note uses ECDC data and an international report delay): https://epiforecasts.io/covid/posts/national/india/

and apologies if this comes across as too much of an advert - feel free to ignore.

sudevschiz commented 4 years ago

This is quite interesting work. Potential candidate for the deep-dive section. @gautam1858 please do have a look.

praveentiru commented 4 years ago

There is a website live to track daily rt estimates for US. https://rt.live/ They have opensourced the Jupyter book they are using to arrive at these values. https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb

This can be a good starting point.

seabbs commented 4 years ago

If this is based on the Instagram analysis then there are some statistical/epi issues with their approach. The website looks excellent however.

gauravbgp commented 4 years ago

Is this being taken up? This could be the single most important metric to track and visualize current transmission rates both state wise and at a national level. http://systrom.com/blog/the-metric-we-need-to-manage-covid-19/ As pointed reference code is already available. This would be required to be calculated in the backend working on the state wise daily confirmed data. Would love to contribute to fast track any effort if this can be prioritized/assigned.

saurabhsharma1976 commented 4 years ago

happy to help consult from my prvs six sigma and stats experience!

praveentiru commented 4 years ago

I am not an statistics expert but, wanted to add what is being done in US. If anyone takes up that would be great. There is a data set at state-level in deep-dive repo of the organization. covid-19-india.csv.

One can use this data to build a model for Rt estimation. The same model can be extended to district-level if shape of data is similar.

saurabhsharma1976 commented 4 years ago

thanks. data is very good though would have loved as many attributes to it eg. city and even locality else it will be rework? Any idea on that With the raw data a lot of stats hypothesis testing can be done and I can help suggest visualise. HAve just got started wit hGithub on Udemy so a long way to go to do myself :( any one willing to collaborate?

On Sun, 19 Apr 2020 at 07:04, Praveen T notifications@github.com wrote:

I am not an statistics expert but, wanted to add what is being done in US. If anyone takes up that would be great. There is a data set at state-level in deep-dive repo of the organization. covid-19-india.csv https://github.com/covid19india/deep-dive/blob/master/data/dataset/covid_19_india.csv .

One can use this data to build a model for Rt estimation. The same model can be extended to district-level if shape of data is similar.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/covid19india/covid19india-react/issues/1084#issuecomment-616002887, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEWZWLBZMCR44XXJOT5VCTRNJILXANCNFSM4MHXDWXQ .

rohanmanthani commented 4 years ago

If this is based on the Instagram analysis then there are some statistical/epi issues with their approach. The website looks excellent however.

@seabbs Could you elaborate on the statistical issues?

Plugged in India data but haven't really done a deep dive on the underlying issues with the model.

Linked here - https://github.com/covid19india/covid19india-react/issues/1275#issuecomment-616025618

seabbs commented 4 years ago

So, unfortunately, I have some work I have to finish today but here is a brief summary of the issues that leap out at me:

  1. Use of smoothed reported cases. They use a gaussian smoother which includes future and past information to smooth current reported cases. This means future information is included when estimating the Rt and biases the curve. An alternative approach to this is to impute the symptom onset date using multiple samples.
  2. Related issue. Presenting Rts by date of report. Interventions impact the Rt on the date of infection therefore interventions can only be compared to Rts by date of infection not report or symptom onset. A simplistic way to do this is to left shift with a more advanced method being to again sample from a distribution.
  3. Another related issue. If you correctly map reported cases -> cases by date of onset then you have issues of right truncation (as cases may have onset but not reported). Again several approaches to dealing with this.
  4. Use of a fixed 7-day smoothing window. This is arbitrary and window choice has been shown to introduce bias (people adjust to get the curve shape they want). The downside to a longer window is that real changes are delayed with the upside being to protect against statistical noise. An improved approach is to optimize the window (ideally within a sampling step) using some kind of proper scoring rule (for example RPS). This allows you to reduce the amount of smoothing and see changing dynamics earlier.
  5. Fixed serial interval. The serial interval has a big impact and ideally, the level of uncertainty here needs to be fully propagated. The simplest approach is to use a distribution. However, this still doesn't not include all of the uncertainty. A better approach is to fit the serial interval using a bayesian approach and then use multiple samples from the posterior.

I probably missed something in there as well. We have dealt with all of the above in our analysis (that is by no means perfect) linked above. There are also multiple limitations that we explore in depth in our methods so I won't go into them here. The key is that it does not account for bias due to changing testing efforts.

rohanmanthani commented 4 years ago

@seabbs

So, unfortunately, I have some work I have to finish today but here is a brief summary of the issues that leap out at me:

  1. Use of smoothed reported cases. They use a gaussian smoother which includes future and past information to smooth current reported cases. This means future information is included when estimating the Rt and biases the curve. An alternative approach to this is to impute the symptom onset date using multiple samples.
  2. Related issue. Presenting Rts by date of report. Interventions impact the Rt on the date of infection therefore interventions can only be compared to Rts by date of infection not report or symptom onset. A simplistic way to do this is to left shift with a more advanced method being to again sample from a distribution.
  3. Another related issue. If you correctly map reported cases -> cases by date of onset then you have issues of right truncation (as cases may have onset but not reported). Again several approaches to dealing with this.
  4. Use of a fixed 7-day smoothing window. This is arbitrary and window choice has been shown to introduce bias (people adjust to get the curve shape they want). The downside to a longer window is that real changes are delayed with the upside being to protect against statistical noise. An improved approach is to optimize the window (ideally within a sampling step) using some kind of proper scoring rule (for example RPS). This allows you to reduce the amount of smoothing and see changing dynamics earlier.
  5. Fixed serial interval. The serial interval has a big impact and ideally, the level of uncertainty here needs to be fully propagated. The simplest approach is to use a distribution. However, this still doesn't not include all of the uncertainty. A better approach is to fit the serial interval using a bayesian approach and then use multiple samples from the posterior.

I probably missed something in there as well. We have dealt with all of the above in our analysis (that is by no means perfect) linked above. There are also multiple limitations that we explore in depth in our methods so I won't go into them here. The key is that it does not account for bias due to changing testing efforts.

Appreciate the response! I believe they actually use a 9-day smoothing window which would amplify some of the error you mentioned. I was looking through the model you linked https://epiforecasts.io/covid/ - really nice work btw. How do you handle changing rate of testing affecting the number of positive cases? Is this at all captured in the uncertainty? To me this sticks out as a missing piece in a lot of r_t models.

Also I was looking at S.Korea - https://epiforecasts.io/covid/posts/national/south-korea/ does this r_t number make sense given their number of cases is pretty low (<40 on the last date you computed for) image

praveentiru commented 4 years ago

Can someone from project kind of map-out how this can be executed? While my knowledge of statistics is poor, I am very good at figuring out languages and getting things to run. I can work on updating the R pipeline shared by @seabbs to work with India data that is available in deep-dive repo. Or, I can do the same with Jupyter notebook from rt.live website.

Can someone volunteer in building UI for the same? If there are some requirements regarding how data should be reported so, it can be embedded into website.

saurabhsharma1976 commented 4 years ago

I can join hands as have reasonable experience on stats and hypothesis testing. Nube on coding and UI. Happy to speak to coordinate better. My no. is 9819003260 and can coordinate over watsapp to speak.

Rgds Saurabh


From: Praveen T notifications@github.com Sent: Wednesday, April 22, 2020 4:21:50 PM To: covid19india/covid19india-react covid19india-react@noreply.github.com Cc: Saurabh Sharma saurabhsharma1976@gmail.com; Comment comment@noreply.github.com Subject: Re: [covid19india/covid19india-react] Rt tracking by state (#1084)

Can someone from project kind of map-out how this can be executed? While my knowledge of statistics is poor, I am very good at figuring out languages and getting things to run. I can work on updating the R pipeline shared by @seabbshttps://github.com/seabbs to work with India data that is available in deep-dive repo. Or, I can do the same with Jupyter notebook from rt.live website.

Can someone volunteer in building UI for the same? If there are some requirements regarding how data should be reported so, it can be embedded into website.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/covid19india/covid19india-react/issues/1084#issuecomment-617703871, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APEWZWJWQX7PW4BNYOPOR7DRN3D4NANCNFSM4MHXDWXQ.

praveentiru commented 4 years ago

ok. I will start model from @seabbs. I will let you know if you I run into any issues that I encounter.

gauravbgp commented 4 years ago

As mentioned, I had already done the modelling as per rt.live's model. And the results did have features which make qualitative sense. It's not needed to wait for a lot more data, though the model will remain sensitive to testing ramp ups even in future. I'm not an expert but I don't think it is a bad model to start with. We can incorporate changes and corrections suggested by experts, later subject to verification by peers.

For the time being I'm uploading some of the calculated data to covid19rt.in

I'm basing the calculations on daily state wise csv from this site. I'd love for it to be verified/cross checked, and corrections suggested. Also I'm myself not involved in frontend a lot so have just come up with a basic template. Would appreciate any help.

praveentiru commented 4 years ago

@gauravbgp Can we not get this integrated with covid19india website? Ideally in state page.

saurabhsharma1976 commented 4 years ago

And can we also put the district level pls in any format. I am from Mumbai and guess some detailed analysis will help this city!

idris15 commented 4 years ago

As mentioned, I had already done the modelling as per rt.live's model. And the results did have features which make qualitative sense. It's not needed to wait for a lot more data, though the model will remain sensitive to testing ramp ups even in future. I'm not an expert but I don't think it is a bad model to start with. We can incorporate changes and corrections suggested by experts, later subject to verification by peers.

For the time being I'm uploading some of the calculated data to covid19rt.in

I'm basing the calculations on daily state wise csv from this site. I'd love for it to be verified/cross checked, and corrections suggested. Also I'm myself not involved in frontend a lot so have just come up with a basic template. Would appreciate any help.

@gauravbgp, I noticed that you are plotting reproduction numbers for 16 states. I was wondering why you haven't included the remaining states. Also, the first plot is for 'TT', and I am having a hard time figuring out what that is. Lastly, while this project figures when and where to include Rt tracking, could you arrange yours in alphabetical order so that it is easier to navigate.

Thanks for your great work!

jeremyphilemon commented 4 years ago

@seabbs @gauravbgp @praveentiru

Apologies for the delay in getting back to this issue. This is an imperative statistic as learnt from the above discussion, and rt.live has visualized it very well. We can do the same too, however, for someone who gets confused with a lot of numbers, I'd really appreciate a simple summary on the status of this issue, and how much accurate we are/can be with the data at hand.

I also see that @seabbs has given his suggestions and potential problems that we may run into when visualizing this. I'd also like to know how much of them can be circumvented. After that we can start with the implementation. I'm also bumping the priority of this issue to get things rolling.

We can sort the frontend part (rt.live uses d3 and we use them on the website too). We can display this for each state in the state page, and then also have a dedicated page that would have all the states for the purpose of comparison.

Also would like to know to who all I can assign this issue to!

praveentiru commented 4 years ago

@jeremyphilemon Thanks for the reply. The current state is that @gauravbgp has implemented rt.live model from which he has built a site for India covidrt.in. @seabbs belongs to a team base at London School of Hygiene and Tropical Medicine. They manage site Epiforecasts where they track epidemics. He believes that there are flaws in model of rt.live

Which is better model? I do not have expertise in this domain to make a call. The way I see it we have following tasks.

People needed to complete task:

I can own up the second role. This is my view.

praveentiru commented 4 years ago

@seabbs @gauravbgp looks like there is a model update from rt.live on 23rd April.

gauravbgp commented 4 years ago

@jeremyphilemon

@praveentiru thanks for succinctly summarizing the status. Sorry, I've been busy and unable to reply. I'll just mention my rationale for putting up the website first.

Firstly, the model is understood to be deficient in terms of being epidemiologically inaccurate in certain regions and being sensitive to changes in testing rates. But, it converts the epi strength to easy to understand ranges of Rt for layman, and casual trend analysis. Also as a metric to track the current progression, it is very similarly useful. Although the estimated Rt values vary slightly highly, and also in cases of initial backlog, Rt even shoots above 3 depending on states to explain cases (this is clearly epidemiologically incorrect) , it is still very correctly indicative of the efficacy of controls in a region, at least qualitatively, and is easy to track. Please have a look at http://covid19rt.in/ and judge for yourselves.

To this effect, I have endeavored to make end to end reports available first, and connect it to people (layman at large) so that they become aware that there is such an apt. metric which even they can use to track, and visualizes the status much better than other graphs available to the user.

rt.live has updated their model to include epiforecast's suggestions of onset delay and serial interval as a distribution. I also plan to update my model, but in some time. Currently, this is the same model rt.live have used and published reports for the US till 23/04 and upto a week after significant media coverage. And right now, I think the more people that can see the report the better.

As for including reports on http://www.covid19india.org . It is my understanding that the model needs sufficient centralization of control and would need maintenance efforts. In my workflow what I could do is once I have my processes sorted a little (~maybe 2-3 weeks, or can be prioritized), I can provide an API to download the generated Rt csv everyday. This can then be included in the site and visualized anyway. (Note: I am a data engineer, have very less frontend touch, and am using almost no js at my site for now. Plots are png exported from matplotlib).

Seeing as how that might take some time, and how I think this report would be very useful for people, I would love if you think it would be appropriate to link to my webpage http://covid19rt.in from your site, since there are already many users.

If not that, at least individual users, if you find the analysis useful, it would be great if you could forward it in your groups.

praveentiru commented 4 years ago

@gauravbgp If you can package your model into a python program. I can work on packaging it to process data from covid19india API once a day and publish the result as an csv or, json as needed for visualization. I can look into visualization part but, it is not really my area of expertise. We can get it to closure if we have someone who is more capable here.

gauravbgp commented 4 years ago

@praveentiru as mentioned even in the webpage, I am already using data from covid19india API. But as I have pointed out in my comment, it is the model that would need to keep undergoing modifications/validation and would also need to clean up and collate data from other sources about other facts.

skyprince999 commented 4 years ago

any update on this activity? i have created a python version on kaggle using data from this site (covid19india.org) Tracking done only for states where case count > 100 realtime tracking of Rt for Indian states

praveentiru commented 4 years ago

Hi all,

I have built R pipeline for using epiforecasts model for generating state level predications.

It generates analysis only for states which have at least one day where there are 40 cases in a day. Which gives analysis for 14 states which are: "AP" "BR" "DL" "GJ" "JK" "KA" "MH" "MP" "PB" "RJ" "TG" "TN" "UP" "WB".

@jeremyphilemon if you can specify the format in which you want data. (json and csv) to enable visualization of the same. I will own up the task of running the pipeline on daily basis to provide with fresh data. You can see the work of model here. We can leverage visualization of rt.live. That is something I need assistance with.

siddharthsrivastava commented 4 years ago

Hi

We have worked on predicting estimated cases, deaths and recoveries for next week in India. We are achieving < 2% error rate till now wrt actual cases. Feel free to explore it at

https://coronaindia.github.io/views/researchcentre.html

I do look forward to improving such forecasts and any suggestions or contributions are welcome.

Now with respect to Rt, http://covid19rt.in/ seems to be giving realistic numbers, however another problem with the underlying SEIR or similar model is the inability to adapt. We have done some work with respect to that and results look promising, will be glad to add or contribute to any similar initiative here.

saurabhsharma1976 commented 4 years ago

Hi. went through the details. Great work by you and team! Any possibility of including at district level pls? For example Mumbai will be very diff prediction vs Maharashtra. city level details are found at https://www.covid19india.org/ as we know best.

On Sat, 9 May 2020 at 23:09, Siddharth Srivastava notifications@github.com wrote:

Hi

We have worked on predicting estimated cases, deaths and recoveries for next week in India. We are achieving < 2% error rate till now wrt actual cases. Feel free to explore it at

https://coronaindia.github.io/views/researchcentre.html

I do look forward to improving such forecasts and any suggestions or contributions are welcome.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/covid19india/covid19india-react/issues/1084#issuecomment-626210928, or unsubscribe https://github.com/notifications/unsubscribe-auth/APEWZWOM6OJYJYAW7OUR5XLRQWIL7ANCNFSM4MHXDWXQ .