EIU-GIScience-Center / covid19map

Animated maps of the COVID-19 epidemic, in html and javascript with D3
MIT License
0 stars 3 forks source link

Attempt to incorporate best-fit growth rates by data up to the given day #1

Closed nkronenfeld closed 4 years ago

nkronenfeld commented 4 years ago

Not quite working - my scale is off, error dots and tooltips aren't showing up. And since tooltips aren't showing up, I can't tell if the data is accurate or not.

geobarry commented 4 years ago

This looks good... can you please develop a way to "isolate" this in a module... and pls think of it as a template for how we add/organize functionality in general

geobarry commented 4 years ago

General comment: can you add a start date parameter to your exponential-growth-curve-fitting function, so that if we want to later we could provide interfaces that allow for fitting over more specific periods?

geobarry commented 4 years ago

The use of circle sizes to show error is not very intuitive visually. Another approach might be to use the error to modify the color (e.g. create a 2-d color scheme). A quick-and-dirty way would be to use error to calculate the alpha (opaqueness) channel - more opaque means more certain. A better way is to make values that are more uncertain appear more gray, as naturally grays are interpreted as more "fuzzy" and uncertain.

nkronenfeld commented 4 years ago
- growth rate... per day?

Yes, if the growth rate is 1.5, it means the best-fit exponential curve has the number of cases multiplying by 1.5 eacah day

- how to interpret daily error

The daily error is roughly equivalent to the standard deviation of the data

- what the heck are the numbers after the date? (e.g. 86/366, for Georgia Mar 26)

Day 86/366 in the year.

Are they all really necessary? We want to keep the tooltip text to a minimum.

No, they were an early experiment in getting data around. They can absolutely be taken out.

Values should be rounded to, say, 2 decimal places

Will do

The meaning of these numbers should also be documented in the code.

Will do

I still don't like the circles for error. Circles dominate visually and suggest an amount of something... error is more like a "meta" variable. Some cartographers suggest stippling, others transparency.

I agree circles are not idea. I do, however, like them better than altering the color. Theoretically, one could, say attach hue to the amount, and saturation to error, and that wouldn't be ... horrible? ... but I really like something physical there, that forces one to acknowledge the error, rather than ignoring it. Meta-variables are easy to ignore, and error is a really bad one for that.

An even better way would be to put a small plot in the tool-tip - showing the actual data and the curve, so one can see at a glance exactly how good a fit the curve is. I've no idea how to do that, but (a) I can look, and (b) if you do have a clue and want to do that or point me in a good direction, that would also be fine.

If you are busy and want me to make these changes, please let me know. Definitely the rate of growth is a good variable to add! I'm happy to make changes, I'm happy for you to make them. I don't tend to get proprietary about code :-) I will certainly have some time to devote to this over the weekend - but if I find stuff on growth rate done, I may try to tackle incorporating Canada :-)

geobarry commented 4 years ago

Thanks for the answers. Incorporating or forking a site for Canada would be great! Other than the data, there is the issue of how to design a user interface for multiple countries/regions - maybe the best way is to have a separate page for each country, and an index page that has links to each country? In any case, I think it would be best to pull out the code that loads the data and place it in a separate module, like you did with the mapping functions... that will make it easier for other people to fork as well. Maybe I'll see if I can do that quickly now...

Regarding the changes to the exponential fit, I've already incorporate some of the changes and pushed them to your repo... or did I do a pull request? Darn, I don't remember... probably I pushed them but I should have done a pull request. Sorry about that... they were all pretty small in terms of code, could be undone pretty easily. But I've been thinking a lot and I think you're going about it slightly wrong. It's a really great idea to show growth rates, but fitting an exponential curve only works in the beginning when the growth is exponential... as you've noted... and afterwards even a sigmoid curve probably won't work well because that's an idealized curve in a closed system without geography, politics, possible off-and-on social distancing, etc. Also from a visualization perspective I think what people will be most interested in are recent growth rates, not overall growth rates... and in any case, with a time slider/animation playing through, say, weekly growth rates from the beginning will communicate the shape of the curve (and the actual curve can be added in eventually as well). So I guess I come back to the idea of fitting the exponential curve through a recent n-day period, rather than the entire period. And if the period is reasonably small, I'm guessing the error in the curve fit will be small as well... what do you think?


From: Nathan Kronenfeld notifications@github.com Sent: Thursday, April 9, 2020 4:24 PM To: EIU-GIScience-Center/covid19map covid19map@noreply.github.com Cc: Barry J Kronenfeld bjkronenfeld@eiu.edu; Comment comment@noreply.github.com Subject: Re: [EIU-GIScience-Center/covid19map] Attempt to incorporate best-fit growth rates by data up to the given day (#1)

Yes, if the growth rate is 1.5, it means the best-fit exponential curve has the number of cases multiplying by 1.5 eacah day

The daily error is roughly equivalent to the standard deviation of the data

Day 86/366 in the year.

Are they all really necessary? We want to keep the tooltip text to a minimum.

No, they were an early experiment in getting data around. They can absolutely be taken out.

Values should be rounded to, say, 2 decimal places

Will do

The meaning of these numbers should also be documented in the code.

Will do

I still don't like the circles for error. Circles dominate visually and suggest an amount of something... error is more like a "meta" variable. Some cartographers suggest stippling, others transparency.

I agree circles are not idea. I do, however, like them better than altering the color. Theoretically, one could, say attach hue to the amount, and saturation to error, and that wouldn't be ... horrible? ... but I really like something physical there, that forces one to acknowledge the error, rather than ignoring it. Meta-variables are easy to ignore, and error is a really bad one for that.

An even better way would be to put a small plot in the tool-tip - showing the actual data and the curve, so one can see at a glance exactly how good a fit the curve is. I've no idea how to do that, but (a) I can look, and (b) if you do have a clue and want to do that or point me in a good direction, that would also be fine.

If you are busy and want me to make these changes, please let me know. Definitely the rate of growth is a good variable to add! I'm happy to make changes, I'm happy for you to make them. I don't tend to get proprietary about code :-) I will certainly have some time to devote to this over the weekend - but if I find stuff on growth rate done, I may try to tackle incorporating Canada :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EIU-GIScience-Center/covid19map/pull/1#issuecomment-611759756, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADFCRMUH5OAL5RTJQGDRYXLRLY4J3ANCNFSM4MBI4XDQ.

nkronenfeld commented 4 years ago

Re: Canada: Totally agreed. The UI is absolutely the tricky part. I was assuming a drop-down like you do for variable, but your other suggestions are just as valid. Re: Changes to this PR - I made it editable by you for a reason, I'm fine with you pushing to it directly, it's your repository :-) Re: Use of exponential curve: I think the sigmoid is closest, and can give you a number of interesting numbers out - predicted max number of patients, predicted time at which half of them are infected, max grown rate - and you're absolutely right, it isn't perfect, it isn't a closed system. But I think it's the closest function to reality, so is probably the best choice to use anyway. And seeing it change as you move the data forward would show how well or ill the fit was - if it changes, it wasn't a great choice. You're right people may be more interested in recent growth rate, but it's not clear to me that the exponential fit won't still give that, at least until growth rates start dropping. Recent data will swamp earlier data, just because otherwise the error numbers are higher. But I don't think just fitting the last few days is a great idea. That seems too prone to over-fitting.

geobarry commented 4 years ago

So my fear is that fitting a sigmoid curve to the entire history might underestimate the severity of a second wave. So I tried an experiment following this thread which which uses a curve-fitting library in python:

https://stackoverflow.com/questions/55725139/fit-sigmoid-function-s-shape-curve-to-data-using-python

I plugged in some artificial data for a "second wave" scenario:

[cid:d243f82a-eb8e-4c12-9398-e1f5c444a6a2]

The result: [cid:8d16ae5a-baa6-47f8-9ebe-34357f8a5a63] So this seems to confirm my fear. Though apparently there are many types of sigmoid curves, and I basically just copied the code in the link without looking too deeply into it... but I suspect it would be the same for any sigmoid curve (actually any curve with an asymptotic limit).

BTW, I tried briefly to separate the data loading function into it's own module, but I got sidetracked with a grant opportunity to possibly support this... it's a longshot, but I'm hoping to submit the grant today and maybe actually be able to pay the two students who have volunteered to help with this. Anyway once that is done I'll go back to the data module idea, and if I get that working and you can help me get Canada data in then probably I'll implement a drop-down box for region selection for now, and worry about the UI aspect later.


From: Nathan Kronenfeld notifications@github.com Sent: Monday, April 13, 2020 10:12 AM To: EIU-GIScience-Center/covid19map covid19map@noreply.github.com Cc: Barry J Kronenfeld bjkronenfeld@eiu.edu; Comment comment@noreply.github.com Subject: Re: [EIU-GIScience-Center/covid19map] Attempt to incorporate best-fit growth rates by data up to the given day (#1)

Re: Canada: Totally agreed. The UI is absolutely the tricky part. I was assuming a drop-down like you do for variable, but your other suggestions are just as valid. Re: Changes to this PR - I made it editable by you for a reason, I'm fine with you pushing to it directly, it's your repository :-) Re: Use of exponential curve: I think the sigmoid is closest, and can give you a number of interesting numbers out - predicted max number of patients, predicted time at which half of them are infected, max grown rate - and you're absolutely right, it isn't perfect, it isn't a closed system. But I think it's the closest function to reality, so is probably the best choice to use anyway. And seeing it change as you move the data forward would show how well or ill the fit was - if it changes, it wasn't a great choice. You're right people may be more interested in recent growth rate, but it's not clear to me that the exponential fit won't still give that, at least until growth rates start dropping. Recent data will swamp earlier data, just because otherwise the error numbers are higher. But I don't think just fitting the last few days is a great idea. That seems too prone to over-fitting.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/EIU-GIScience-Center/covid19map/pull/1#issuecomment-612940944, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADFCRMXORWNFKXBIDN7HT7LRMMTVJANCNFSM4MBI4XDQ.

nkronenfeld commented 4 years ago

I can't see your second-wave data or result there, they are just random strings beginning "[cid:". I would like to see what you mean here. In the absence of any real information, though, I'll ask: is this any better or worse for any other curve fit? Presumably, in a second wave, any curve that models a single wave is going to start getting a lot more error, isn't it?

On Mon, Apr 13, 2020 at 12:44 PM Barry Kronenfeld notifications@github.com wrote:

So my fear is that fitting a sigmoid curve to the entire history might underestimate the severity of a second wave. So I tried an experiment following this thread which which uses a curve-fitting library in python:

https://stackoverflow.com/questions/55725139/fit-sigmoid-function-s-shape-curve-to-data-using-python

I plugged in some artificial data for a "second wave" scenario:

[cid:d243f82a-eb8e-4c12-9398-e1f5c444a6a2]

The result: [cid:8d16ae5a-baa6-47f8-9ebe-34357f8a5a63] So this seems to confirm my fear. Though apparently there are many types of sigmoid curves, and I basically just copied the code in the link without looking too deeply into it... but I suspect it would be the same for any sigmoid curve (actually any curve with an asymptotic limit).

BTW, I tried briefly to separate the data loading function into it's own module, but I got sidetracked with a grant opportunity to possibly support this... it's a longshot, but I'm hoping to submit the grant today and maybe actually be able to pay the two students who have volunteered to help with this. Anyway once that is done I'll go back to the data module idea, and if I get that working and you can help me get Canada data in then probably I'll implement a drop-down box for region selection for now, and worry about the UI aspect later.


From: Nathan Kronenfeld notifications@github.com Sent: Monday, April 13, 2020 10:12 AM To: EIU-GIScience-Center/covid19map covid19map@noreply.github.com Cc: Barry J Kronenfeld bjkronenfeld@eiu.edu; Comment < comment@noreply.github.com> Subject: Re: [EIU-GIScience-Center/covid19map] Attempt to incorporate best-fit growth rates by data up to the given day (#1)

Re: Canada: Totally agreed. The UI is absolutely the tricky part. I was assuming a drop-down like you do for variable, but your other suggestions are just as valid. Re: Changes to this PR - I made it editable by you for a reason, I'm fine with you pushing to it directly, it's your repository :-) Re: Use of exponential curve: I think the sigmoid is closest, and can give you a number of interesting numbers out - predicted max number of patients, predicted time at which half of them are infected, max grown rate - and you're absolutely right, it isn't perfect, it isn't a closed system. But I think it's the closest function to reality, so is probably the best choice to use anyway. And seeing it change as you move the data forward would show how well or ill the fit was - if it changes, it wasn't a great choice. You're right people may be more interested in recent growth rate, but it's not clear to me that the exponential fit won't still give that, at least until growth rates start dropping. Recent data will swamp earlier data, just because otherwise the error numbers are higher. But I don't think just fitting the last few days is a great idea. That seems too prone to over-fitting.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub< https://github.com/EIU-GIScience-Center/covid19map/pull/1#issuecomment-612940944>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/ADFCRMXORWNFKXBIDN7HT7LRMMTVJANCNFSM4MBI4XDQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EIU-GIScience-Center/covid19map/pull/1#issuecomment-612981470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVGXBL7FPCI6ZQCEXY44ITRMM6NTANCNFSM4MBI4XDQ .

geobarry commented 4 years ago

Maybe... I feel like asymptotic curves are especially dangerous, but I could be wrong... But basically that's why I'm leery of the idea of mapping parameters of a fitted curve. People tend to put more faith into maps than they should, I know that's why you wanted to put in the measure of error but it's harder see what the error really means on a map like you would on a scatterplot.