Closed td928 closed 2 years ago
I built on my local and then grouped jobs by permit year. STATUS_Q_devdb
had 35526 jobs with permit year of 2022, but FINAL_devdb
only has 2422. On the other hand, STATUS_Q_devdb
has 5708 jobs with permit year of 2011, and FINAL_devdb
has 5705. (I chose these two years at random).
Te can you help me understand where this dropoff comes from?
To check that the creation of the permit_year
is correct I ran this
with
recent_permits as (SELECT * from dob_permitissuance WHERE issuancedate::date > DATE '2000-01-01' ),
recent_dob_now_permits as
(SELECT * from dob_now_permits WHERE issued_date::date > DATE '2000-01-01'),
STATUS_Q_create as (
SELECT
jobnum as job_number,
min(issuancedate::date) as date_permittd
FROM recent_permits
WHERE jobdocnum = '01'
AND jobtype ~* 'A1|DM|NB'
GROUP BY jobnum
UNION
SELECT
left(job_filing_number, strpos(job_filing_number, '-') - 1)::text as job_number,
min(issued_date::date) as date_permittd
FROM recent_dob_now_permits
WHERE right(job_filing_number,2)='I1'
GROUP BY left(job_filing_number, strpos(job_filing_number, '-') - 1)::text
),
STATUS_Q_devdb_recent as (
SELECT
job_number,
date_permittd,
-- year_permit
extract(year from date_permittd)::text as permit_year,
-- quarter_permit
year_quarter(date_permittd) as permit_qrtr
FROM status_Q_create
)
SELECT permit_year, count(*) from STATUS_Q_devdb_recent group by permit_year order by permit_year
Which shows the earliest permit year as 2000
Ran this query to see that creation of permit_qrtr
is correct. Filtering for march records means all records are assigned Q1 which seems correct
with
march_permits as (SELECT * from dob_permitissuance WHERE EXTRACT(Month from issuancedate::date) = 3 ),
march_dob_now_permits as
(SELECT * from dob_now_permits WHERE EXTRACT(Month from issued_date::date) =3),
STATUS_Q_create as (
SELECT
jobnum as job_number,
min(issuancedate::date) as date_permittd
FROM march_permits
WHERE jobdocnum = '01'
AND jobtype ~* 'A1|DM|NB'
GROUP BY jobnum
UNION
SELECT
left(job_filing_number, strpos(job_filing_number, '-') - 1)::text as job_number,
min(issued_date::date) as date_permittd
FROM march_dob_now_permits
WHERE right(job_filing_number,2)='I1'
GROUP BY left(job_filing_number, strpos(job_filing_number, '-') - 1)::text
),
STATUS_Q_devdb_recent as (
SELECT
job_number,
date_permittd,
-- year_permit
extract(year from date_permittd)::text as permit_year,
-- quarter_permit
year_quarter(date_permittd) as permit_qrtr
FROM status_Q_create
)
SELECT permit_qrtr, count(*) from STATUS_Q_devdb_recent group by permit_qrtr order by permit_qrtr
Will be ready to approve once Te and I talk about the dropoff of 2022 records from STATUS_Q_devdb
to final_devdb
Everything ran successfully on my machine and went through and looked at some of the outputs. I ran a count of classa_net records by year and got a range of 3000 - 7400 units created per year with ~30000 units not ever being assigned a year but this seems to come from withdrawn permits or stalled permits.
Overview
address #500 which was first spotted when housing was looking status q and found that DOB NOW portion of the HDB is missing their status q field, which reflects earliest permit date associated with a project. After conversations with DOB, it was determined that a new dataset needs to be incorporated and it was delivered through the same DOB FTP process to us. A new workflow similar to DOB NOW applications is created in data library and is ingested into DevDB.
dataloading
dataloading for the new dob now permits is simple enough by reading the datasets from edm-recipes using the
import_public
function._status_q.sql
the status q ingesting and cleaning were done in the
_status_q.sql
. With new DOB NOW permits file, ~180,000 records were added to theSTATUS_Q_devdb
table. With the raw file, ~400,000 records were filtered down with two logics 1) onlyI1
(stands for "initial" according to Sam) records were included and 2) only the earliest permit for eachjob_filing_number
is included because that is what the status q concerns about.Another major change for this workflow involves the removing of the joining to the init_devdb table to create the
STATUS_Q_devdb
table in this .sql file. A more detailed rationale for this removal will be explained in the section below._mid.sql
Even though no change is made to the
_mid.sql
in this PR, but it is actually crucial to understanding the status q process. Because theSTATUS_Q_devdb
, and thereof the earliest permit date, is merged onto the main devdb table in the_mid.sql
. The merge is aLEFT JOIN
based onjob_number
. One issue, however, is that while the permit records themselves are unique due to the filtering logics mentioned above, thejob_number
ininit_devdb
is not unique at this stage of the table. This leads the two subsequent join clause in_status_q.sql
and then_mid.sql
will multiply the table to be extremely large and often crash the program. I made the change to remove the first join withinit_devdb
completely because it is redundant in my opinion to do the functionally same join twice.