covidcaremap / covid19-healthsystemcapacity

Open geospatial work to support health systems' capacity (providers, supplies, ventilators, beds, meds) to effectively care for rapidly growing COVID19 patient needs
https://www.covidcaremap.org
MIT License
97 stars 38 forks source link

Merge HIFLD data into HCRIS data #25

Closed lossyrob closed 4 years ago

lossyrob commented 4 years ago

See issue comment https://github.com/daveluo/covid19-healthsystemcapacity/issues/14#issuecomment-599885877

This data should be merged into usa_hospital_beds_hcris2018_cleaned3.geojson in the data folder. The 'BEDS' field should be renamed to 'Total Licensed Beds'. Unclear what other columns should be carried over - perhaps an object inside the feature properties that just has everything, with a property name hifld?

Since there are lat lng coordinates in the HIFLD data, we can do a spatial join on buffered points to match the datasets.

daveluo commented 4 years ago

I would carry over all of the nonredundant columns from HIFLD. This dataset seems to be purpose-made to facilitate emergency/crisis management, i.e. with columns like HELIPAD, STATUS, TRAUMA, TYPE, OWNER

For example, STATUS == CLOSED facilities we can use to flag facilities that can be re-opened to add capacity.

aaronxsu commented 4 years ago

From https://github.com/daveluo/covid19-healthsystemcapacity/issues/14#issuecomment-599571862, these are the target statistics we're aiming to derive with this data:

daveluo commented 4 years ago

Note that we're mainly been focused on acute care capacity/bed supply for adults (these hospital types are often labeled as "Acute Care Hospital" or "Critical Access") right now but there are many other facility types in these datasets (like Peds and Womens' hospitals, nursing facilities, long term rehab). I'm guessing the processing and data merge is mostly the same amoubt of work whether we keep all facility types in or not? If so, preference is to keep every facility in the combined dataset and just not expose it to view for now

daveluo commented 4 years ago

Heads-up re: the HFILD data and some potentially missing data points (which we can quickly crosscheck against our HCRIS data summed by state or county):

thanks to https://forums.fast.ai/t/help-with-algorithm-for-covid19-relative-risk/65328/15:

About HIFLD, we are only showing in NY 1/10 of the beds they actually have, so there may be bad data, or I may have done something in preprocessing. I’m going to take a closer look soon but just make sure your data for NY State is checked/working.

daveluo commented 4 years ago

The type of facility is determined from the last four digits of its Medicare provider number:

Facility Type last 4 digits of PROVIDER_NUMBER in HCRIS data
Short Term Acute Care 0001-0899
Childrens 3300-3399
Critical Access 1300-1399
Long Term 2000-2299
Psychiatric 4000-4499
Rehabilitation 3025-3099
Other none of above

See HOSP2010_README.txt for more reference documentation about our HCRIS reported data (which is for all Hospital facilities as defined here and who submit that "2010" form: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/Hospital-2010-form). From "HOSPITAL2010-DOCUMENATION.ZIP" in the above link