Bringing open data to affordable housing decision makers in Washington DC. A D3/Javascript based website to visualize data related to affordable housing in Washington DC. Data processing with Python.
This issue refers to back-end tasks to facilitate maintainability of the codebase and validation of the data served up to the front-end. In addition to the primary goals of refactoring the codebase and verifying and validating our existing data, outstanding unimplemented features/tasks related to expanding our data source and server maintenance and automation are included.
Refactor
[ ] Refactor '_populate_zone_facts' to be cleaner (#563)
see #594, #596, #556, and # 493 for more context
[ ] Data ingestion workflow refactor - nearly completed in PR #593.
[ ] Verify that row-level skipping when error occurs in ingestion workflow #524
[ ] Refactor HILogger to better fit in new ingestion design pattern
currently aggregate log is hard to review, maybe consider subset of logs as needed per process: download, cleaning, and database upload?
[ ] Persist missing address id to avoid additional mar api calls later in cleaning process (#592)
[ ] review remaining codebase for opportunities to simplify, streamline, and reduce repetition
Is manifest still the best way to track data sources and update?
Can we automate meta.json update workflow?
Verify, validate, and clean up existing data
[ ] Write description of how we dedupe our sources of projects to share w/ office of planning
[ ] Update data documentation for all tables currently in the database, and meta.json entries
[ ] Zone-based data should not have null values (#456)
[ ] Accurately capture the percent of unites in a zone that are subsidized (#564)
[ ] investigate issue with inaccurate residential_units counts (#596)
[ ] Track all sources that a project was found in during ingestion (#577)
see #574 for more context
[ ] Implement end-to-end unit testing of our data ingestion workflow
[ ] identify missing and incomplete unit tests
[ ] review and improve existing unit tests
[ ] Investigate how to better handle invalid address issues (#603)
Additional data sources
[ ] Add ANC's to weighting; add ANC to zone facts.
[ ] Add loan information from DHCD to the subsidy records being added from Quickbase
[ ] Populate total unit counts at the project level #495
[ ] Add percent ami - DHCD data has this at project level; other prescat sources we can use a mapping based on program type that was created by Karynna.
Server
[ ] Create a 'sources' class that checks an S3 bucket for new files and downloads them during get_api_data().
we have updated Prescat data that can be used to test this (#613)
This issue refers to back-end tasks to facilitate maintainability of the codebase and validation of the data served up to the front-end. In addition to the primary goals of refactoring the codebase and verifying and validating our existing data, outstanding unimplemented features/tasks related to expanding our data source and server maintenance and automation are included.
Refactor
Verify, validate, and clean up existing data
Additional data sources
Server
500