GeoDaCenter / covid-atlas

An interactive map of COVID-19 data with spatial analysis tools
GNU General Public License v3.0
7 stars 0 forks source link

GeoDa integration #5

Open bertday opened 4 years ago

bertday commented 4 years ago

The current Atlas uses GeoDa wrapped as a WebAssembly module for a few spatial analysis tasks:

This is a really interesting approach and has been fun to explore from an engineering standpoint—however, I did want to bring up a few discussion points as we start to migrate these features into the refactored app.

At the same time I think we can all recognize that GeoDa is driving some of the most important insights in the Atlas, so I wanted to make a few suggestions around how we could modify the stack to provide the best user experience possible without compromising on spatial analysis.

I was wondering if the group had any additional thoughts or ideas re: Geoda on the front end. Once again, I think this is a fantastic feature of the app and looking forward to finding a long-term solution for supporting it in the Atlas 😄

/cc @jkoschinsky @Makosak @lixun910 @linqinyu

lixun910 commented 4 years ago

Sounds good, Robert! At design level (e.g. what browsers to support etc), you can check with @Makosak At technical level, it’s your call :) as long as we are heading to a refactoring of faster, nicer and more features Covid atlas, in a manageable time and budgets. The GeoDa wasm also have weights creation and centroids for labeling. Local Moran is caching for slider feature of USAFacts only, others are real time local Moran. But all features are available in pygeoda/rgeoda except the cartogram. I can help if you need any details.

Xun Li, Ph.D.

Assistant Director of Data Science Center for Spatial Data Science University of Chicago

On Apr 19, 2020, at 3:56 PM, Robert Martin notifications@github.com wrote:

 The current Atlas uses GeoDa wrapped as a WebAssembly module for a few spatial analysis tasks:

Hotspot/cluster analysis (aka LISA) Cartograms Possibly others (wondering if @lixun910 might have more insights?) This is a really interesting approach and has been fun to explore from an engineering standpoint—however, I did want to bring up a few discussion points as we start to migrate these features into the refactored app.

Browser support: WebAssembly has wide support on modern browsers, but is not implemented on legacy browsers such as Internet Explorer. Based on our user analytics, about 2.2% of visits came from IE. That's a small number to be sure, but my concern is that some of these folks are key stakeholders at less-resourced institutions, or are working under IT policies that bar them from installing their own browser (this is common at hospitals and gov't agencies). As we consider the overall accessibility of the site, we may want to keep these users in mind. Performance/scaling: My understanding was that there were some issues with the in-browser LISA analysis taking too long when the app started, which is why we've been caching the results as JSON files and checking them into the repo (if I got any of that wrong, please feel free to weigh in @lixun910). As our data grows over time and as we add new sources, this will likely become a larger issue. Packaging: the current WebAssembly module injects a variable called Module into the global JavaScript scope, which presents some challenges when integrating with a Node-based project such as the refactored app. I've started a repo that would turn the module into a Node package, but this may take more time than we've allocated for the refactor. At the same time I think we can all recognize that GeoDa is driving some of the most important insights in the Atlas, so I wanted to make a few suggestions around how we could modify the stack to provide the best user experience possible without compromising on spatial analysis.

Preprocessing: One option would be to preprocess the data with GeoDa so that the WebAssembly module isn't needed for things like viewing clusters; this would address both browser support and performance issues. As I mentioned above, we may already be doing this to some extent, so it would just be a matter of formalizing the caching process and possibly using pygeoda in place of the browser workflow. I think this would be the most straightforward approach and would be my recommendation. Server-side processing: Another option would be to deploy pygeoda as a lightweight backend service, such as an AWS Lambda function, that the browser app could call when it needs to run analysis. We could use something like this in the case of the cartogram, which I'm not sure could be pre-processed as easily. I was wondering if the group had any additional thoughts or ideas re: Geoda on the front end. Once again, I think this is a fantastic feature of the app and looking forward to finding a long-term solution for supporting it in the Atlas 😄

/cc @jkoschinsky @Makosak @lixun910 @linqinyu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lixun910 commented 4 years ago

I can also make the local Moran run much faster for the time slider, so there is no need to cache. But, again it’s your call. Just let me know if you need it, and I can provide a new function for this speed up in GeoDa wasm.

Xun Li, Ph.D.

Assistant Director of Data Science Center for Spatial Data Science University of Chicago

On Apr 19, 2020, at 3:56 PM, Robert Martin notifications@github.com wrote:

 The current Atlas uses GeoDa wrapped as a WebAssembly module for a few spatial analysis tasks:

Hotspot/cluster analysis (aka LISA) Cartograms Possibly others (wondering if @lixun910 might have more insights?) This is a really interesting approach and has been fun to explore from an engineering standpoint—however, I did want to bring up a few discussion points as we start to migrate these features into the refactored app.

Browser support: WebAssembly has wide support on modern browsers, but is not implemented on legacy browsers such as Internet Explorer. Based on our user analytics, about 2.2% of visits came from IE. That's a small number to be sure, but my concern is that some of these folks are key stakeholders at less-resourced institutions, or are working under IT policies that bar them from installing their own browser (this is common at hospitals and gov't agencies). As we consider the overall accessibility of the site, we may want to keep these users in mind. Performance/scaling: My understanding was that there were some issues with the in-browser LISA analysis taking too long when the app started, which is why we've been caching the results as JSON files and checking them into the repo (if I got any of that wrong, please feel free to weigh in @lixun910). As our data grows over time and as we add new sources, this will likely become a larger issue. Packaging: the current WebAssembly module injects a variable called Module into the global JavaScript scope, which presents some challenges when integrating with a Node-based project such as the refactored app. I've started a repo that would turn the module into a Node package, but this may take more time than we've allocated for the refactor. At the same time I think we can all recognize that GeoDa is driving some of the most important insights in the Atlas, so I wanted to make a few suggestions around how we could modify the stack to provide the best user experience possible without compromising on spatial analysis.

Preprocessing: One option would be to preprocess the data with GeoDa so that the WebAssembly module isn't needed for things like viewing clusters; this would address both browser support and performance issues. As I mentioned above, we may already be doing this to some extent, so it would just be a matter of formalizing the caching process and possibly using pygeoda in place of the browser workflow. I think this would be the most straightforward approach and would be my recommendation. Server-side processing: Another option would be to deploy pygeoda as a lightweight backend service, such as an AWS Lambda function, that the browser app could call when it needs to run analysis. We could use something like this in the case of the cartogram, which I'm not sure could be pre-processed as easily. I was wondering if the group had any additional thoughts or ideas re: Geoda on the front end. Once again, I think this is a fantastic feature of the app and looking forward to finding a long-term solution for supporting it in the Atlas 😄

/cc @jkoschinsky @Makosak @lixun910 @linqinyu

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

bertday commented 4 years ago

Thank you @lixun910, really appreciate your feedback on this! I didn’t realize it was only USAFacts being cached so that helps. I had forgotten about centroid labels. I’m not as familiar with weights but maybe we can review those sometime 🙂

Thinking about this some more, I’m not sure that I’m going to have time to get up to speed with pygeoda and write the caching script at the same time as working on the front end, so that may be a reason to take a more incremental approach. The issue of browser support might still be good to talk about sometime with @Makosak

I’d like to try to wrap the Wasm code into a npm package if I can (quickly). I’ll follow up on that repo with a few questions I had. If it feels like it’s taking too long, I’m happy to try to bring in the global module.

lixun910 commented 4 years ago

spatial weights is for the urgent requirement from @Makosak https://github.com/Makosak that when mouse over hot spots, also highlight their neighbors.

I’d suggest not going further on npm package, since it’s under active development, and it is custom built for covid project, not for a generic library like pygeoda/rgeoda.

Xun

On Apr 19, 2020, at 7:07 PM, Robert Martin notifications@github.com wrote:

@Makosak https://github.com/Makosak

Makosak commented 4 years ago

Process sounds great overall. I'd be in favor of more caching as the central data core will get large and could help work on a variety of browsers, but we could spin off that fast GeoDa action for some customized tools in the future. But really, whatever works best for the core crew here.

We could also just show both cores + neighbors for clusters for the next release instead of the neighbor highlight -- will be okay to wait until May release so we have time to test out both solutions and determine which is easier & more effective with users. Getting in more data easily will still be priority first, and we can pull in features back from the GeoDa functionality next.