NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

22Q4 UPDATE #618

Closed mbh329 closed 1 year ago

mbh329 commented 1 year ago

Update code

Update source data

DCP Admin Boundaries from Bytes

DOB data

Additional datasets

damonmcc commented 1 year ago

noting that dob_jobapplications and dob_permitissuance don't have update templates in the data-library repo

their links in this issue template go to the archived recipes repo where they are listed in the recipes.csv file. will revisit these datasets

mbh329 commented 1 year ago

@damonmcc We have to run data-sync action for the new dob_jobapplications and dob_permitissuance data

mbh329 commented 1 year ago

@AmandaDoyle what's this doe_school_subdistricts dataset? It hasn't been updated since 2017 according to digital ocean

damonmcc commented 1 year ago

data sync run passed for dob_jobapplications and dob_permitissuance, but appears to have used docker-geosupport:22.2.2 for geocoding. I'll start a (or use an existing) branch to update version.env and re-run

mbh329 commented 1 year ago

the DOB_DATA_DATE should be 20230213 and you have to declare the version of geosupport inVERSION_GEO which should be 23.1.x

mbh329 commented 1 year ago

Still not quite sure what the CAPTURE_DATE refers to

AmandaDoyle commented 1 year ago

@mbh329 here is a previous version of the .env file as an example to help explain capture data. For this version these variables should be

CAPTURE_DATE=01-01-2023
CAPTURE_DATE_PREV=2022-07-01

In short it's the reference data for the version (i.e. 22Q4 is a snapshot of all DOB records filed before 01-01-2023) and is used to select records.

damonmcc commented 1 year ago

working on an issue with the Data Sync action. seems like the update to the geosupport image may have included an update to python packages in it, leading to sql errors in the Data Sync action

I'm using the branch I made for this data update: https://github.com/NYCPlanning/db-developments/pull/619

update: successful Data Sync run

mbh329 commented 1 year ago

@damonmcc whats the status on the dcp_cdboundaries, dcp_cb2010 and dcp_censustracts?

AmandaDoyle commented 1 year ago

@mbh329 and @damonmcc Flagging again that I think the issue template needs to be updated. The current list of datasets that are ingested to build DevDB are in dataloading here dcp_censustracts is now dcp_ct2020 for example Happy to talk anything through if easiest.

damonmcc commented 1 year ago

whats the status on the dcp_cdboundaries, dcp_cb2010 and dcp_censustracts? @mbh329

the first 2 are done and checked off in the issue description. dcp_censustracts (now called dcp_ct2020) just ran successfully and is now checked off

mbh329 commented 1 year ago

@AmandaDoyle I can make those edits now

damonmcc commented 1 year ago

per group convo: must ensure all versions of source data and geosuppport used in the 22Q4 build are the last 2022 versions. this also ensures the Council Districts will be the current ones and not be the future ones

mbh329 commented 1 year ago

@AmandaDoyle @damonmcc Is there a reason as to why we are bringing in the non _wi (e.g. dcp_cb2020) admin boundaries for some datasets and the _wi for others (dcp_boroboundaries_wi)? It would make more sense to me if we were pulling in just _wi admin boundaries as supposed to the mix and match method we have currently

AmandaDoyle commented 1 year ago

@mbh329 and @damonmcc After a little bit of digging, the rationale I came up with for not changing the existing clipped boundaries to be water included is to keep the aggregate files all clipped. Also, I think the dof_shoreline may be used to limit records that are on land (but not 100% sure).
Please revert back to dcp_mappluto (this PR) this is a bigger issue that needs to be addressed more carefully since it will result in code changes. This was discussed late last year. I think it's advisable to keep this DevDB update as simple as possible and not change what we're loading in.

mbh329 commented 1 year ago

@AmandaDoyle dcp_mappluto and dcp_mappluto_wi are both bringing in the "unclipped" or _wi version of MapPLUTO based on the yaml templates in data library. dcp_mappluto vs. dcp_mappluto_wi

AmandaDoyle commented 1 year ago

@mbh329 Right - I'm aware, but we'll need to then change dcp_mappluto to dcp_mappluto_wi in the code for it to work.

mbh329 commented 1 year ago

Okay got it. The next step then is to update dcp_mappluto template in data library with the updated url so that we can pull 22v3 into devdb

damonmcc commented 1 year ago

Okay got it. The next step then is to update dcp_mappluto template in data library with the updated url so that we can pull 22v3 into devdb

done!

mbh329 commented 1 year ago

FYI dob_now_applications doesn't have the correct number of columns needed after the 22Q2 enhancement that was made

AmandaDoyle commented 1 year ago

Questions I have from review

mbh329 commented 1 year ago

@AmandaDoyle

mbh329 commented 1 year ago

Morning yall - I figured out the issue with the QAQC historic data and why 22Q2 wasn't showing up in the qaqc_historic table. I can explain in more detail at stand up but essentially because of the data being overwritten in digital ocean due the github actions erroneously exporting to DO on the dev and main branches, the latest data never contained the 22Q2 row. To get around this, I uploaded a locally cached qaqc_historic.csv and qaqc_historic.sql file to DO and the data is now populated on my local DevDB build. Happy to chat at stand up. Biggest takeaway is we need to fix the Github Actions which I am doing now