datamade / just-spaces

🏕 A tool from University City District and DataMade to promote better and more just public spaces
https://justspacesproject.org
MIT License
7 stars 1 forks source link

Identify public data sources for the public-facing site #51

Closed reginafcompton closed 5 years ago

reginafcompton commented 5 years ago

We will spend some time identifying outside data best suited for the public-facing analyses.


Initial notes

American Community Survey https://censusreporter.org/

LODES

A collaboration between state tax agencies and Census – gives emphasis to employment data.

Could be useful for analyzing spaces used largely by people who work in the area, rather than people who live nearby (e.g., the Porch).

Limitations:

jeancochrane commented 5 years ago

Here's a first pass at data sources that I'd like to see. My general approach was to take each item in the PLDP and find data for that item that might provide important context. In addition, I created a Misc category for data that don't correspond to anything directly in the PLDP but that strike me as important.

reginafcompton commented 5 years ago

Terrific work! I think using the PLDP as a guide works well. (N.b., I adjusted a couple inaccurate links – that's why it says "edited by regina..." above.)

Additions

Too much data? That's not my initial reaction. We're designing an import process for census data: I see no reason to be ungenerous with that particular data source.

In fact, I'd like to see us pull in data about educational attainment from Census, since Tyler from UCD flagged that as something they might like to include. Tenure by Educational Attainment of Householder could be meaningful.

Possible omissions

With that said, I think we should try to minimize the amount of data sources and also focus on data with obvious PLDP equivalences. I'd probably omit:

Do we know about the ease-fulness of importing data from SEPTA? I think we could do something with transits stops data noted above, but if it proves hard to get, then we could omit that too.

Other notes

Did book-of-lists end up in bitbucket? Let's check with @derekeder.

For population density, could we use the ACS: https://catalog.data.gov/dataset/census-block-groups-by-population-density-2012-acs?

jeancochrane commented 5 years ago

In fact, I'd like to see us pull in data about educational attainment from Census, since Tyler from UCD flagged that as something they might like to include. Tenure by Educational Attainment of Householder could be meaningful.

I put in a link to Educational Attainment above, which gives the raw counts without breaking it down by household type. Let me know what you think about that!

I think we should try to minimize the amount of data sources and also focus on data with obvious PLDP equivalences. I'd probably omit:

  • Daily or monthly summaries of temp/precipitation by FIPS code (NOAA)
  • Land use by parcel (Phila.gov)

I'll be sad to see Land Use go, but you make a good point. Weather data seems relevant to the PLDP required fields survey_context.survey_temperature_c and survey_context.survey_microclimate -- if we're reporting those fields, it might be useful to know whether the weather was unseasonably warm/cool or rainy for the area during a survey. But maybe that's not a set of concerns that UCD cares about? Curious what your take is.

Do we know about the ease-fulness of importing data from SEPTA? I think we could do something with transits stops data noted above, but if it proves hard to get, then we could omit that too.

I've had bad experiences working with SEPTA APIs in the past, mostly around A) undocumented APIs and B) unreliable real-time data. But the data portal I linked above seems to be a big improvement -- it looks like they're using a data portal CMS that makes the data much easier to understand, and it's static data so it shouldn't be too complicated. See e.g. trolley stops and bus stops.

I say we go for it, and if it turns into a rabbit hole we try to exit quickly.

Did book-of-lists end up in bitbucket? Let's check with @derekeder.

book-of-lists is on GitLab these days; updated the link above accordingly.

For population density, could we use the ACS: https://catalog.data.gov/dataset/census-block-groups-by-population-density-2012-acs?

This seems promising, but I can't find a corresponding table on FactFinder or Census Reporter. Is this an ACS table, or a special dataset prepared by Montgomery County?

reginafcompton commented 5 years ago

Responding in turn!

(1) Climate data: I guess I could see an opportunity for making meaningful comparisons, like, "it's usually sunny in UCD, which might account for x,y,z in the survey data, collected on a rainy day." At the very least, I think this might be something worth bringing up with UCD. I am not familiar with NOAA, though. Do you know anything about its usability?

(2) Yes, I cannot find an ACS table for population density...as you note, the link seems to be just for Montgomery County, and I do not see a similar one for Philly. Hmmmm. Let's consult with Master Forest on this one.

Everything else looks fantastic. Thanks!

jeancochrane commented 5 years ago

Climate data: I guess I could see an opportunity for making meaningful comparisons, like, "it's usually sunny in UCD, which might account for x,y,z in the survey data, collected on a rainy day." At the very least, I think this might be something worth bringing up with UCD. I am not familiar with NOAA, though. Do you know anything about its usability?

I've used NOAA for weather reports, but never as a source for a data pipeline. Forest may have more thoughts.

Yes, I cannot find an ACS table for population density... [...] Let's consult with Master Forest on this one.

Agreed!

jeancochrane commented 5 years ago

@reginafcompton I took a final pass based on conversations with Forest! You can see my edits if you click on the edited link at the top of this post.

The changes include:

Forest also brought up the good point that we need to help UCD figure out which geographies are appropriate here -- not just so that we know what level of resolution to import, but also so that we can determine how a survey should be linked to a particular geography (e.g. if a survey takes place in Clark Park, what should the "catchment area" be for that survey? A radius of 2 miles? Or perhaps we let the survey designer determine the geography?)

reginafcompton commented 5 years ago

Beautiful @jeancochrane !

Yes, the question about location has been on our minds, and we discussed it at our February meeting: https://github.com/datamade/just-spaces/issues/60. I think you raise a very useful point: the behavior of our import could help inform how we think about location data collected in surveys (or vice versa). Let's plan to (re)raise this issue with UCD, when we send our initial email about source data.

reginafcompton commented 5 years ago

@jeancochrane I think we can close this issue. Is there somewhere else in particular where you'd like to document the sources we plan to import? Or will this issue serve that purpose?

jeancochrane commented 5 years ago

I think this issue is as good a place as any! I'll go ahead and close.