Closed td928 closed 2 years ago
This pipeline is bad what idiot designed this lol
Also I just realized that most files aren't formatted with black. We should address this in this PR or a separate one. Whenever you're ready to think about it
Cast lat/long columns to appropriate datatype
@mbh329 let me know if the latest commit fixed this issue for you. Thanks!
also fixed for residential eviction @mbh329
The latest commit works on the for housing_production
at the puma
level but still an issue with housing_security. Seems to be an issue with evictions
Fixed, the housing_security and housing_production at the puma level work as expected @td928
That is fine with me definitely want to make sure this right
Would it be helpful to go over this together with a little code review? @td928 @SashaWeinstein
The next step for me is to run the new code, can get on a call later in the day
Lol Te why did you edit my comment instead of responding
Lol Te why did you edit my comment instead of responding
some next level move by me. I guess now we can merge this whole thing in? @SashaWeinstein
Doing my final review now
The refactor to move the hardcoding of the columns to separate functions looks good to me. The only thing is that I think the convention is to return List[str]
instead of list
https://stackoverflow.com/questions/52623204/how-to-specify-method-return-type-list-of-what-in-python
293 multiple reviewers required 🐬 🐬
Overview
Stemmed from the request from SOE to hold the source data constant from previous update while also adding new fields to the output which leads to the question: Do we really know which versions of source data we are using in our output? After Max and I figured out which datasets/output are impacted by the open data sources we pulled. This leads me on this mini-expedition to figure out how to keep track of the versions of the few open data sources (for now) in the pipeline. Then I remembered @SashaWeinstein already did this for the facility database, so I went over there and grabbed a lot of the codes off the shelf.
ingest_helpers.py
andmetadata.py
This works does expand across a few files so might require some patience to review each. But the thread tying everthing together is the new code in
ingest_helpers.py
andmetadata.py
.metadata.py
is almost verbatim from the facility database. I do wonder if we should change it a little to have the dumpmetadata.yml
somewhere else but that should be a relatively easy change to make.ingest_helpers.py
was slimmed down a little bit to do just the job. I was debating whether the fetch version from config file is useful for this task and decided we could do without it. So now it takes the version from thedatasets.yml
file and check whether it is downloaded in theingest/data_library/dataloading.sh
then would read in the file.datasets and indicator scripts
There are currently only four datasets coming directly from open data. They are all listed in the
datasets.yml
file. Then each of the indicator impacted by those source data were updated withread_from_s3
functions. Some additional tinkering was required for e.g. historical district where a pandas dataframe is ingested first then converted to geopandas.export metadata
using the existing infrastructure for exporting. I modify both the workflow file
export.yml
and alsoexport_DO.sh
to accomodate the need for themetadata.yml
exporting. I am depositing this file directly in the root of thedb-eddt
folder on DO.