ResearchSoftwareInstitute / greendatatranslator

Green Team Data Translator Software Engineering and Development
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Develop Roadway/School Exposures API and UI #143

Open karafecho opened 6 years ago

karafecho commented 6 years ago

The purpose of this issue is to develop a Roadway/School Exposures API and UI for nationwide data on proximity: (1) from primary residence to nearest major road/highway; (2) from primary residence to nearest public school; and (3) from nearest public school to nearest major road/highway.

lstillwe commented 6 years ago

First step - get VM(s) set up for this

VM requested (bdt-proximity.edc.renci.org) on 6/26/18

VM created 6/29/18

lstillwe commented 6 years ago

@karafecho Kara, I looked at public and private school data provided. Public school data does not have lat, lon locations - so I will need to geocode the addresses provided.

Also private school data does not include lat, lon or address - so I probably will not be able to include this data until we get addresses for the 21,903 private schools provided.

karafecho commented 6 years ago

@lstillwe : Thanks for the update. Do you want to reach out to Ann about the private school data? Perhaps she has addresses? Also, do you have a feel for how much extra work this will entail, in terms of geocoding the addresses?

lstillwe commented 6 years ago

@karafecho - Yes - I will reach out to Ann.

I have already written a script that will use a Google API to do the geocoding. The only issue is that Google only allows you to do about 2500 addresses a day - we have ~ 100,000 for the public school data. The script is constantly running - retrying Google until it gets an okay to do some more work. So far it has worked through 7500 of the addresses. This should get done in time for the Sept hackathon.

karafecho commented 6 years ago

@lstillwe : Should we break this issue into two separate ones, one for roadways and one for schools?

lstillwe commented 6 years ago

@karafecho - I don't think so since it will just be one API. One issue is good unless you want to track data supply issues separately.

lstillwe commented 6 years ago

API definition created. Can be viewed here: https://app.swaggerhub.com/apis/proximity_api/roadway_proximity_aos2/1.0.0#/default/

karafecho commented 6 years ago

From Sarav, 7/13/18:

The new road network data is based on the DOT/FHWA's HPMS network for the year 2016. Overall, from various local comparisons, we saw better capture of local roads than before, and availability of both daily traffic and average speeds on most road segments. Lisa, can you summarize what you received from Brian with specific outstanding questions.

From our internal evaluation, we decided to go with HPMS-2016 for ongoing refinements to C-LINE, given various enhancements in HPMS-2016, compared to HPMS-2013 and TIGER that we previously assessed.

lstillwe commented 6 years ago

I have a rough version of this API completed - just Tiger line data from census for right now. If time permits, school locations will be added later. Still need to dockerize this server.

lstillwe commented 6 years ago

Met with Alex and got him going with the current roadway proximity API. He will be working to add support for HPMS-2016 data.

arunacs commented 6 years ago

Alex completed updating this API with following updates. Given desired lat/lon and roadway distance buffer, looks up HPMS-2016 dataset, and outputs several new variables, besides distance:

{ "aadt": 5060, "distance": 251.000765, "latitude": 37.2, "longitude": -79.334, "name": "Route ID", "roadtype": "Rural Restricted Access", "speed": 55, "through_lanes": 4 }

lstillwe commented 6 years ago

This API is completed and now available here: http://bdt-proximity.renci.org:8080/roadway_proximity_api/v1/swagger.json Also please see this repository for documentation: https://github.com/lstillwe/datatranslator-exposure-apis

karafecho commented 6 years ago

Thanks, once again, @lstillwe !

karafecho commented 5 years ago

A Roadway/School Exposures API and UI have been completed. However, the API may need to be updated if we add new data (e.g., school exposures).

valenal commented 5 years ago

Road segment clean up for database table:

We started out with 7.5 million major road segments. We could not assign ~258,000 road segments to a FIPS code. This was mainly in areas were too much of the roads were outside of the FIPS shapefile. These roads were dropped from the total number of roads.

There were many segments that didn't have a road type. We used a variety of approaches to assign a meaningful road type using various combinations of matching 2016 data with 2013 data or finding close segments parallel to 2013 data. There were many segments without an urban code. For these, we overlaid TIGER urban area polygons, and segments that touched these polygons were assigned the appropriate urban code. Any remaining records without a road type were removed. This amounted to ~141,000 road segments. Many of these were roads that had yet to be built.

Thus, ~5% of the roads were dropped from the final dataset.

Method to add AADT and Speed when not available from HPMS

The HPMS2016 data included values for AADT for most segments, as well as speed limits for many segments. We employed a variety of approaches to assign AADT and speed values to records that didn't have a value. We built a lookup table with county averages by road type and facility type and populated it with values from segments that already had AADT and/or speed.

We were able to populate AADT and speed values for most of the roads using this lookup table, but there were still segments that didn't match (especially ramps). For these segments, we assigned speeds based on the function class of the road. In cases where AADT didn't populate, we sometimes had to fall back on state averages. For outliers, we often had to ignore the urban/rural distinction, or use averages for a similar (but different) road type.

About years available, at the time, we didn’t have 2016 data so we built it from individual states. Now, we can obtain data directly from FHWA and can download HPMS data from 2011 to 2017 directly from their website.

karafecho commented 5 years ago

@lstillwe : Please confirm that the National Center for Education Statistics data that we obtained from Cedar Grove also are available on data.gov. I'm also wondering how hard would it be to add those data to the Roadway API or develop a separate API. Thanks!

karafecho commented 5 years ago

@lstillwe : Please ignore my request above. I've since confirmed the availability of the NCES data on data.gov. WRT my second question, however, I think it might be good to either develop a new School Exposures API or expand the Roadway Exposures API to include the NCES data. See this Reveal segment, suggesting higher rates of asthma among children who attend schools in close proximity to farms. Turns out the increased risk can be mitigated by policy, i.e., prohibiting pesticide application during the day. @stevencox @cpschmitt @arunacs : what do you think?

lstillwe commented 5 years ago

Kara,

Is the school data you want me to use in the Google drive GreenTeamDataTranslator/SchoolData or is the different data on data.govhttp://data.gov?

I found this dataset on data.govhttp://data.gov that also may help us access whether the school is in a rural area or not: https://catalog.data.gov/dataset/nces-locale-boundaries

Lisa

On Apr 22, 2019, at 12:58 PM, karafecho notifications@github.com<mailto:notifications@github.com> wrote:

@lstillwehttps://github.com/lstillwe : Please ignore my request above. I've since confirmed the availability of the NCES data on data.govhttp://data.gov. WRT my second question, however, I think it might be good to either develop a new School Exposures API or expand the Roadway Exposures API to include the NCES data. See this Reveal segmenthttps://www.wunc.org/post/farm-wars, suggesting higher rates of asthma among children who attend schools in close proximity to farms. Turns out the increased risk can be mitigated by policy, i.e., prohibiting pesticide application during the day. @stevencoxhttps://github.com/stevencox @cpschmitthttps://github.com/cpschmitt @arunacshttps://github.com/arunacs : what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/143#issuecomment-485475984, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AANWARD5CKSCMFVDQ72IFJDPRXVDJANCNFSM4FFQBAKA.

lstillwe commented 5 years ago

Kara,

This data looks good: https://nces.ed.gov/programs/edge/Geographic/SchoolLocations Is this what you wanted to use? It has Public School data for 2015 - 2018.

Thanks, Lisa

On Apr 23, 2019, at 11:08 AM, Stillwell, Lisa Ann lisa@renci.org<mailto:lisa@renci.org> wrote:

Kara,

Is the school data you want me to use in the Google drive GreenTeamDataTranslator/SchoolData or is the different data on data.govhttp://data.gov/?

I found this dataset on data.govhttp://data.gov/ that also may help us access whether the school is in a rural area or not: https://catalog.data.gov/dataset/nces-locale-boundaries

Lisa

On Apr 22, 2019, at 12:58 PM, karafecho notifications@github.com<mailto:notifications@github.com> wrote:

@lstillwehttps://github.com/lstillwe : Please ignore my request above. I've since confirmed the availability of the NCES data on data.govhttp://data.gov/. WRT my second question, however, I think it might be good to either develop a new School Exposures API or expand the Roadway Exposures API to include the NCES data. See this Reveal segmenthttps://www.wunc.org/post/farm-wars, suggesting higher rates of asthma among children who attend schools in close proximity to farms. Turns out the increased risk can be mitigated by policy, i.e., prohibiting pesticide application during the day. @stevencoxhttps://github.com/stevencox @cpschmitthttps://github.com/cpschmitt @arunacshttps://github.com/arunacs : what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ResearchSoftwareInstitute/greendatatranslator/issues/143#issuecomment-485475984, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AANWARD5CKSCMFVDQ72IFJDPRXVDJANCNFSM4FFQBAKA.

karafecho commented 5 years ago

@lstillwe : I agree, the NCES data look great. The NCES dataset that we received from Cedar Grove can be found on Google here, years 2015-2016. I'll let you decide whether or not you wish to use the Cedar Grove sample or create a new dataset from source files.