Closed mbh329 closed 1 year ago
noting that dob_jobapplications
and dob_permitissuance
don't have update templates in the data-library repo
their links in this issue template go to the archived recipes repo where they are listed in the recipes.csv
file. will revisit these datasets
@damonmcc We have to run data-sync action for the new dob_jobapplications
and dob_permitissuance
data
@AmandaDoyle what's this doe_school_subdistricts
dataset? It hasn't been updated since 2017 according to digital ocean
data sync run passed for dob_jobapplications
and dob_permitissuance
, but appears to have used docker-geosupport:22.2.2
for geocoding. I'll start a (or use an existing) branch to update version.env
and re-run
the DOB_DATA_DATE
should be 20230213
and you have to declare the version of geosupport inVERSION_GEO
which should be 23.1.x
Still not quite sure what the CAPTURE_DATE refers to
@mbh329 here is a previous version of the .env file as an example to help explain capture data. For this version these variables should be
CAPTURE_DATE=01-01-2023
CAPTURE_DATE_PREV=2022-07-01
In short it's the reference data for the version (i.e. 22Q4 is a snapshot of all DOB records filed before 01-01-2023) and is used to select records.
working on an issue with the Data Sync action. seems like the update to the geosupport image may have included an update to python packages in it, leading to sql errors in the Data Sync action
I'm using the branch I made for this data update: https://github.com/NYCPlanning/db-developments/pull/619
update: successful Data Sync run
@damonmcc whats the status on the dcp_cdboundaries
, dcp_cb2010
and dcp_censustracts
?
@mbh329 and @damonmcc Flagging again that I think the issue template needs to be updated.
The current list of datasets that are ingested to build DevDB are in dataloading here
dcp_censustracts
is now dcp_ct2020
for example
Happy to talk anything through if easiest.
whats the status on the
dcp_cdboundaries
,dcp_cb2010
anddcp_censustracts
? @mbh329
the first 2 are done and checked off in the issue description. dcp_censustracts
(now called dcp_ct2020
) just ran successfully and is now checked off
@AmandaDoyle I can make those edits now
per group convo: must ensure all versions of source data and geosuppport used in the 22Q4 build are the last 2022 versions. this also ensures the Council Districts will be the current ones and not be the future ones
@AmandaDoyle @damonmcc Is there a reason as to why we are bringing in the non _wi (e.g. dcp_cb2020
) admin boundaries for some datasets and the _wi for others (dcp_boroboundaries_wi
)? It would make more sense to me if we were pulling in just _wi admin boundaries as supposed to the mix and match method we have currently
@mbh329 and @damonmcc After a little bit of digging, the rationale I came up with for not changing the existing clipped boundaries to be water included is to keep the aggregate files all clipped. Also, I think the dof_shoreline
may be used to limit records that are on land (but not 100% sure).
Please revert back to dcp_mappluto
(this PR) this is a bigger issue that needs to be addressed more carefully since it will result in code changes. This was discussed late last year.
I think it's advisable to keep this DevDB update as simple as possible and not change what we're loading in.
@AmandaDoyle dcp_mappluto
and dcp_mappluto_wi
are both bringing in the "unclipped" or _wi
version of MapPLUTO based on the yaml templates in data library. dcp_mappluto vs. dcp_mappluto_wi
@mbh329 Right - I'm aware, but we'll need to then change dcp_mappluto
to dcp_mappluto_wi
in the code for it to work.
Okay got it. The next step then is to update dcp_mappluto template in data library with the updated url so that we can pull 22v3 into devdb
Okay got it. The next step then is to update dcp_mappluto template in data library with the updated url so that we can pull 22v3 into devdb
done!
FYI dob_now_applications
doesn't have the correct number of columns needed after the 22Q2 enhancement that was made
Questions I have from review
job_status
value of 4. Partially Completed Construction
? This involves reviewing the _MID_devdb
table to make sure that when this logic is applied it only returns one record.Add most recent full/half year to aggregate tables
step is mute at this point right? That's automatic?@AmandaDoyle
4. Partially Completed Construction
and it looks like there is only 1 record that satisfies the logic to create that value. The checks I did were to take a look at the _mid_devdb
table in postico and check each line of logic. There are 774 records that have a co_latest_certtype = 'T- TCO' AND classa_net >= 20
. Of those records, only 5 records have a classa_complt_pct
less than 1 and only 1 record has a value that meets the classa_complt_pct
less than 0.80Morning yall - I figured out the issue with the QAQC historic data and why 22Q2 wasn't showing up in the qaqc_historic table. I can explain in more detail at stand up but essentially because of the data being overwritten in digital ocean due the github actions erroneously exporting to DO on the dev and main branches, the latest data never contained the 22Q2 row. To get around this, I uploaded a locally cached qaqc_historic.csv and qaqc_historic.sql file to DO and the data is now populated on my local DevDB build. Happy to chat at stand up. Biggest takeaway is we need to fix the Github Actions which I am doing now
Update code
Update source data
Make sure the following are up-to-date in recipes:
dcp_mappluto_wi
dof_shoreline
updated with zoningtaxlots, safe to ignorecouncil_members
check opendatedoitt_buildingfootprints
check opendatadoitt_buildingfootprints_historical
check opendatadoitt_zipcodeboundaries
-> never changed, safe to ignoredoe_school_subdistricts
-> received from capital planningdoe_eszones
-> the url for this changes year by year, search on opendatadoe_mszones
-> same as abovehpd_hny_units_by_building
check opendataDCP Admin Boundaries from Bytes
dcp_cdboundaries
dcp_cb2010
dcp_censustracts
-> was failing in data library because it has a new name: dcp_ct2010dcp_school_districts
dcp_boroboundaries_wi
dcp_councildistricts
dcp_firecompanies
dcp_policeprecincts
DOB data
dob_cofos
-> manually updated, received by emaildob_jobapplications
check actionsdob_permitissuance
check actionsdob_now_applications
-> DOB contacts us via email that the data is ready, the data is downloaded from the DOB FTP using credentials, manually uploaded to DO and ingested via Data Library pipelinedob_now_permits
-> DOB contacts us via email that the data is ready, the data is downloaded from the DOB FTP using credentials, manually uploaded to DO and ingested via Data Library pipelineAdditional datasets