JUNE: BROWSE Master Issue

NealHumphrey commented 7 years ago

For our June 13th launch, the Browse team is in charge of both populating our database with the bulk of our data sources, as well as building out the front end navigation to toggle between viewing multiple data sources. These are the components that need to be completed in time for our June 13th launch:

Data quality

[x] Geocode all data sources with a consistent format for the city zones (ward, zipcode, etc.)
[x] Add nlihc_id to any data source that could match a record from the project table
[x] Coordinate with the Filter team for the creation and use of a 'Neighborhood Facts' table (related, #236)

Ingestion infrastructure

[x] Add a system to update all data from APIs with one function call.
[x] Design system for connecting downloaded API flat files to our current ingestion method
[x] Encapsulate load_data.py into a class or function that can be callable from another script #257
[x] Make sure that data is appended or overwritten as needed. #132
[ ] Utility for joining new meta.json #196

Data Sources

[x] Make a complete plan for data sources ingestion #188
[ ] Add all additional affordable housing records from DC gov sources
- [ ] DMPED's database available from opendata.dc.gov #194
- [ ] DHCD's pipeline database #136 Additional sources of affordable housing projects (as described here can wait until the V2 launch.
[x] WMATA data from API #200
[x] Tax Assessment data #198
[x] Crime data #187
[x] All 5 PresCat tables
[x] Census tables #152
[x] All housing units (non-subsidized) #140
[ ] Zillow market rent data #53 and #15
[ ] PUD applications #50 (should make new, clean ticket)
[ ] Building Permits #16 (should make new, clean ticket)

Browse view

[x] Toggle between data sets (probably similar sidebar list that can slide in/out like this one or similarly this one)

ptgott commented 7 years ago

@NealHumphrey @ostermanj @jkwening @dogtoothdigital Today I've been working on changing around rich-map.js to accommodate a more extensible range of data sets. It's all really messy so far, but I should have a PR ready by Tuesday's meeting. The main goals are:

Query the API as menu options are selected, rather than loading all data sets at once at the start.
Use a JSON file to populate the menu of data sources and zones. This way people adding datasets to the API can make them available to the front-end without getting into the JS. From the JSON file, only the names of available data sets and zones are listed--once you click on the appropriate menu options, the code sends an XHR request to the corresponding API endpoint.
Change the behavior of the datasets menu. Clicking a dataset makes only the available zones visible.

ostermanj commented 7 years ago

@NealHumphrey @ostermanj @jkwening @dogtoothdigital

@ptgott: I may have done a bit of the same in the refactoring PR #232, at least re: accommodating a diversity of data sets. In that PR, the housing-insights.js has a getData() method that consults a manifest (preferably in the future the manifest) to build the correct API call based on the parameters passed to it. For instance: getData('crime','all','ward'); or getData('raw','project');. It checks the local dataCollection first before making a an API call. It's returning data via D3's d3.JSON method and allows passing in a custom callback to be run as part of D3's success function. That's one potential way.

ptgott commented 7 years ago

@NealHumphrey @ostermanj @jkwening @dogtoothdigital This approach sounds good. One thing I was working on was building the menu of data and zone layers within the Browse view from a JSON file, rather than hard-coding it into the JS. This way the Python team can let the JS team know which datasets are available within the API, and which zone layers go with them, in a way that immediately makes them available. The JSON would have a key for each table name, and as the value for that key an array that includes the names of the available zone aggregates. The Browse View first loads that JSON file and builds a menu of all the available datasets within the API. Using a JSON file that includes only names, it can do this without having to query the API itself. This approach would be easy to combine with getData(): clicking on a combination of menu options (one for a dataset and one for a zone) would trigger getData() with the appropriate arguments. One thing we could also do in the future is use a manifest (in JSON) to build the menu of dataset and layer names.

In any case, I'll review PR #232 and test it locally this evening and suggest things that I can add on the browsing side or merge from what I've been working on. Thanks for working on this!

ptgott commented 7 years ago

It looks like I'll have my changes to the Browse view ready for a PR by this weekend, rather than tomorrow. Some of the changes will have a more efficient counterpart in Refactor Option 2 (see PR #232), in particular the loading of datasets as the user selects them, rather than all at once during the initial page load. Others will be newer, such as populating the data menu based on a .json file of available API data (see above), as well as combining any dataset available to the API with geojson and assigning a random color range to the heat map. The goal will be for us to stuff a .json file with names of tables and the zone-based aggregates (ward, zip, tract...) available for those tables in the API, and make that the only step for populating the menu and adding overlays to the map (everything else would be automatic).

thischrisoliver commented 7 years ago

Here's the Requested Data Status spreadsheet (i.e. the data wishlist).

ptgott commented 7 years ago

@NealHumphrey @ostermanj @jkwening @dogtoothdigital In PR #261 I've made a number of changes to the organization of the Browse view.

Since we'll be making @ostermanj's refactor the new basis for the Browse view, I'm taking this PR as a set of suggestions to incorporate with the refactor. We can discuss which features from this PR to add to the refactored Browse view.

Note that everything related to pie charts in this PR is currently broken. This is mainly because the pie chart code relies on fetching project.csv, which the changes I made do not currently support. We'd probably solve this problem by using the code from the refactor for fetching data. Since we'll eventually be using that code anyway, I figured that I could delay the task of fixing the pie charts in this PR until we implement the refactor.

My goal for the Browse view here As much as possible, the front-end won't need to know about these things prior to sending a request to the server:

The specific tables available on the API, as well as their column names
The zones available as geoJSON on the server
The directory structure of the back-end

This way, people working on back-end/Python/data stuff can make data available to the API without people working on the JS having to adjust the code that queries the server.

What I've done so far

Added a ‘datasets_for_browse.json’ file with table names as keys and an array of available aggregation zones (ward, neighborhood_cluster…) as values. This file also includes URLs for the API endpoints. Once a dataset is available in the API, we'll just need to add it to this file and it will automatically populate the data/zone menus. I've also removed the hardcoded zones array in favor of anticipating an open-ended range of zones.

datasets_for_browse.json is similar to the 'manifest' object in @ostermanj's refactored housing-insights.js, but with endpoints for zones associated with the datasets available within them.

When the Browse view loads, XHRs use datasets_for_browse.json to populate the datasets/zones menu without returning any actual datasets. Clicking on both a dataset and a zone queries the API for the appropriate endpoint, as well as the 'data' directory for the appropriate zone's geoJSON file. The JS then combines the zone's geoJSON with the data returned from the API, and produces a Mapbox layer (if this doesn't exist already) with a randomly assigned color.

We can talk about incorporating the functionality of 'datasets_for_browse.json' with the 'manifest' object in the refactor or, say, 'meta.json'.
Selecting a dataset makes a menu of its available zones appear. So far, the code assumes a hierarchy in which the user selects a dataset first then selects a zone. I've done this instead of keeping the zone menu the same length throughout to allow for flexibility in the zones we include. E.g. if only one dataset is available for a particular zone type, we could show that zone type only for that dataset and avoid graying it out with the other datasets.
Added a LayerOption object. I've moved all functions that end in ‘Layer’ in the previous code to LayerOption as methods. We'll probably want to integrate this abstraction more thoroughly into the refactor, since it duplicates a lot of the refactor's functionality (especially keeping track of state).
Moved code related to the datasets/layers menu into reusable constructors that bind LayerOptions to the link elements within the data/zone menus. Once again, we'd need to incorporate this into the refactor.
Renamed neighborhood.geojson to neighborhood_cluster.geojson--here I'm assuming a convention in which geoJSON polygon files must match the name of a zone within an API endpoint.

What we'd still need to do

The initial call for loading data from the 'project' endpoint is hardcoded. There are more nested callbacks related to loading the initial data than I'd like.

As a gesture toward one way of loading the initial data without knowing about which tables it comes from, I've put a 'default_datasets' key in 'data_for_browse.json'. The JS would first fetch all data from the 'default_datasets' object and add them to the Mapbox map.
I added some ‘if’ conditions to pie.js so that they wouldn't cause runtime errors with the code in this PR. Again, we can find a more elegant way to do this as we implement the refactor.
In writing the code for populating the data/zone menus, I've removed the option for displaying only zone boundaries. We could add a 'boundaries' option as a dataset, though we'd have to tweak the code in this PR to accommodate 'line' geoJSON files as well as polygons.

emkap01 commented 7 years ago

Per the "Data Sources" section in this Issue, I updated the Data Pipeline dashboard to indicate which data sources have been fully processed and which ones still need some work as of today.

Almost all of the datasets listed above have been fully processed, although a couple remain outstanding. In addition, if you set every filter on the right side of the dash to "All", it will expand to include numerous additional topics for which we may not even have any raw data, or for which we may have raw data that we have yet to process. That topic list itself could probably use some cleanup, since some of those topics may have become redundant and/or irrelevant given how the app has evolved over the last several months, but I figured that most can still be useful reminders/context for our data "wish list" going forward.

@NealHumphrey I will defer to you to check off boxes in this ticket as appropriate, since you mentioned that you need to do some cleanup on various Issues this coming week anyway.

focusconsulting / housing-insights

JUNE: BROWSE Master Issue #237