Closed lossyrob closed 4 years ago
Thanks @lossyrob, this looks good and covers the bases enough for us to get going on parallelizing work. We can always refine and remix the groupings as we learn more from validation research and start trying to put things together.
I presume now we can start defining subtasks within each grouping? If so, can you demonstrate with a few example tasks how that should be written out and organized within the project? Then I and others can follow your lead.
:+1: will do, thanks!
In order to maximize our urgent time and those that want to collaborate on this project, we should have a clear work plan of work that can be parallelized and combine to reach the project's goals. This issue outlines a broad approach to the work, and is an iteration on Dave's How to Help section. The work plan here will be translated into issues that an individual interested in helping out can take on or contribute to effectively.
Some high level goals of the project as I understand them so far:
In order to make that happen, I can see work being split up into 4 groupings: Data Gathering, Cleaning & Cataloging, Data Analytics, Visualization, and Project Direction. Someone may choose to work on tasks that contribute to multiple work groups at once; this is only meant to be a logical division of the work that will help detail and potentially parallelize components of the project that will combine to accomplish the project goals.
Data Gathering, Cleaning & Cataloging
This group of work includes:
Data Analytics
Here are three data analytics work streams I see so far:
Estimated Hospital System Capacity
This is the analysis that exists in this repository now. Can we make this estimation better? Is this the right level of detail (county level)?
Since this has been identified as a dataset gap in the community, once determined of a sufficient quality the product of the analysis should be published as a dataset published by this project. This would accomplish project goal A.
Epidemiological Modeling
In order to perform the comparison analysis which identifies the care gap we need an estimation of effected population over time. More specifically, we want to know the projected number of active cases putting demand on the healthcare system in different locations at different times.
There are several open source approaches to this type of modeling, and the ideal case is to reuse other's work. For instance, perhaps there is an implementation of a SIR model that we could run at a county level based on census demographics to generate a per-county per-timepoint health system stress dataset. Or perhaps there is already someone publishing modeling data at an appropriate aggregation for our analysis that we can just use directly.
A stretch goal would be to generate a ML challenge or competition that could take advantage of community participation to develop a more accurate model. This would rely on the ability to develop the supervised training dataset mentioned in the Data Gathering, Cleaning & Cataloging work group above.
Comparing Capacity vs Forecasts
Once we have a dataset that estimates health system capacity, and the ability to forecast stress over time on the healthcare system, we will be able to identify care gaps. This analysis would seek to help answer the questions Dave posted in the README:
This would accomplish project goal B. It is dependent on the ability to produce Epidemiological Models sufficient for the analysis, as well as the Estimated Hospital System Capacity dataset being generated.
Visualization
Answering questions about the healthcare system's capacity and it's ability to handle the stresses of the COVID19 outbreak are only as good as the ability to communicate those answers effectively. The data visualization component aims to make compelling visualizations that communicate the information that is important and actionable. While information can be used to help the crisis, it can also add to the noise, increase panic, and otherwise be unhelpful.
This would accomplish project goal C.
Project Direction
Besides building the visualizations and the data that powers them, we will need people to test, validate, document, and determine the usefulness of the tools generated by this project. We also need people who will be able to explain what these tools are and why they are important for personal & community protection and public health decision making at the local, county & state levels. Also, we need people to connect to other open source and open data efforts so that we are contributing to the larger community effort and not duplicating work unnecessarily.