ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Correctly join calculated number of parcels per sale to sales view #334

Closed wrridgeway closed 4 months ago

wrridgeway commented 4 months ago

Joining calculated number of sales currently uses an adjusted document number against an un-adjusted document number.

wrridgeway commented 4 months ago

Reviewing the data sent by Bill, it seems some sales were getting filtered out during ingest by the sale.num_parcels_sale <= 2 condition in our SQL pull. This was because some sales had NULL values for num_parcels_sold due to a bad join condition.

select
    new.pin,
    new.sale_date,
    new.doc_no,
    new.num_parcels_sale as new,
    old.num_parcels_sale as old
from "z_ci_333-correctly-join-calculated-number-of-parcels-per-sale-to-sales-view_default".vw_pin_sale new
left join default.vw_pin_sale old
    on new.pin = old.pin
    and new.doc_no = old.doc_no
where new.pin = '14313050581003'
pin sale_date doc_no new old
14313050581003 5/26/2016 1620350014 1 1
14313050581003 6/5/2018 1816206243 1 1
14313050581003 5/17/2021 2121008004 1  
wrridgeway commented 4 months ago

Count of number of sales with <= 2 for number of parcels per sale now vs NULL previously:

select
    new.year,
    count(*) as "new sales"
from "z_ci_333-correctly-join-calculated-number-of-parcels-per-sale-to-sales-view_default".vw_pin_sale new
left join default.vw_pin_sale old
    on new.doc_no = old.doc_no and new.pin = old.pin
where new.num_parcels_sale <= 2 and old.num_parcels_sale IS NULL
group by new.year
order by new.year desc
year new sales
2023 8745
2022 9545
2021 15162
2020 5909
2019 87
2018 33
2017 19
2016 15
2015 21
2014 12
2013 8
2012 11
2011 2
2010 2
2008 2
2007 1
2006 2
2004 1
2003 1
2002 2
2001 1
2000 1