Chicago / vision-zero-dashboard

Vision Zero Dashboard
MIT License
14 stars 20 forks source link

Request detailed data on Hospital || Emergency Service for the crashes #12

Open linicole opened 4 years ago

linicole commented 4 years ago

The topic that I would like to dig into is about the hospital and emergency service for the traffic crashes in Chicago. But right now, there is a lack of data around this subject. So I would like to request or see more information in the public datasets. Thank you!

sas1336 commented 4 years ago

The public health department has access to Emergency Department data through Syndromic Surveillance https://www.cdc.gov/nssp/overview.html I wonder if similar to IDOT, IDPH provides some of the hospital data publicly. I'm aware of some research done by merging crash data and the hospital billing data.

linicole commented 4 years ago

In addition, the "HospitalName" field in Person set contains lots of code-type data instead of hospital names, and there's no elaboration in the data dictionary doc. Thanks!

geneorama commented 4 years ago

I was going to check with the people at the State, but it's clearly just messy data. I can't imagine a data dictionary helping here.

There are 3,322 hospital name entries here are the top 20 entries:

> people[ , .N, HospitalName][order(-N)][1:20]
             HospitalName       N
 1:                       1906337
 2:               REFUSED   26630
 3:                   DNA   11967
 4:                  NONE    6192
 5:                    99    3421
 6:                 MERCY    1936
 7:              DECLINED    1645
 8:           ST. BERNARD    1507
 9:              MT SINAI    1320
10:            HOLY CROSS    1308
11: STROGER (COOK COUNTY)    1250
12: UNIVERSITY OF CHICAGO    1124
13:               UNKNOWN    1062
14:               STROGER    1002
15:          NORTHWESTERN     924
16:           REFUSED EMS     822
17:    ROSELAND COMMUNITY     805
18:       COMMUNITY FIRST     803
19: NORTHWESTERN MEMORIAL     717
20:               LORETTO     704

This gives you a sense for what the head / tail of the hospital names look like:

> people[ , .N, HospitalName][order(-N)]
                  HospitalName       N
   1:                          1906337
   2:                  REFUSED   26630
   3:                      DNA   11967
   4:                     NONE    6192
   5:                       99    3421
  ---                                 
3318:    REFUSED- EMS ON SCENE       1
3319:   REFUSED - EMS ON SCENE       1
3320: REFUSED MEDICAL ATTETION       1
3321:         LUTHERN  GENERAL       1
3322:             NORHTWESTERN       1

I've used the qlcMatrix in the past to examine string distances. I'm attaching a text file (which should really be a .R file) where I developed a function that checks an input list to see what it most closely matches. mmatch.txt

I think this function could work, but we'd need a master list of hospitals and their locations.

I think it would nicer to do some unsupervised learning on the data and look for natural clusters by name, and then see if the similar name clusters are also geographically concentrated. You'd think that certain areas would depend on certain hospitals. However, they might also use more distant hospitals that have (for example) superior trauma units or specialties.

hneaz commented 4 years ago

Hi, I found a list of trauma centers here:

List of Trauma Centers

hospital_list.zip - I created a csv file from the link above.

Map of Trauma Centers - Probably need to find a way to download for Illinois.

I have a fuzzy matching script in R that can match with a full list of hospitals and trauma centers using the Levenshtein and Jaro distance (stringi and stringdist packages). We also need to create another column to identify if it is a hospital or maybe we need to fully clean the HospitalName column. I will work on it this week.

sas1336 commented 4 years ago

Thanks @hneaz I've also reached out to CDPH. Let's see what they provide. I can ask if there is a downloadable map.

sas1336 commented 4 years ago

CDPH shared with us this website that has a list of all the hospitals along with their addresses: https://data.illinois.gov/dataset/410idph_hospital_directory/resource/9bdedb85-77f3-490a-9bbd-2f3f5f227981

One can export that data to csv

In case there isn't a way to export the map that @hneaz mentioned, we can geolocate the hospitals through their addresses. I don't know if there's a tool in R for that, but let me know if we need to do it in GIS and I can do so using address locator (at least for Chicago).

hneaz commented 4 years ago

@sas1336 Thank you for the link. This is very useful. I can start working on cleaning up the hospitals names and also try out @geneorama's script.

If we need the lat/long for the hospitals, I can get them using SmartyStreets Python API by uploading the file from the link provided. I have the unlimited subscription plan from work that I can use to get clean addresses with lat/long.

[UPDATE] I got the latitude and longitude data for the hospitals. Please find attached. illinois_hospital_list.csv.zip