Open td928 opened 2 years ago
After some research, the issue is narrowed down to the group by query for the tract.sql
specifically
GROUP BY
YEARLY_devdb_{{ decade }}.boro,
YEARLY_devdb_{{ decade }}.bct{{ decade }},
YEARLY_devdb_{{ decade }}.centract{{ decade }}
It turns out that for some record in the yearly_devdb_2020
table the centract2020
is sometimes filled out but also sometimes null therefore causing multiple rows for a single bct2020
which supposed to be the index row for dataframes to be joined on.
It is somewhat unexpected because both of those fields are populated together in the _spatial.sql
and should always have value as long as the other have value. But the logic to create them is quite complex involving a multiple-way join between a few tables:
DROP TABLE IF EXISTS SPATIAL_devdb;
SELECT
DRAFT_spatial.uid,
DRAFT_spatial.geo_bbl,
DRAFT_spatial.geo_bin,
DRAFT_spatial.geo_address_numbr,
DRAFT_spatial.geo_address_street,
DRAFT_spatial.geo_address,
DRAFT_spatial.geo_zipcode,
DRAFT_spatial.geo_boro,
DRAFT_spatial.geo_csd,
DRAFT_spatial.geo_cd,
DRAFT_spatial.geo_council,
DRAFT_spatial.geo_policeprct,
DRAFT_spatial.geo_firedivision,
DRAFT_spatial.geo_firebattalion,
DRAFT_spatial.geo_firecompany,
DRAFT_spatial.geo_schoolelmntry,
DRAFT_spatial.geo_schoolmiddle,
DRAFT_spatial.geo_schoolsubdist,
DRAFT_spatial.geo_latitude,
DRAFT_spatial.geo_longitude,
DRAFT_spatial.latitude,
DRAFT_spatial.longitude,
DRAFT_spatial.geom,
DRAFT_spatial.geomsource,
CENSUS_TRACT_BLOCK.fips||DRAFT_spatial._geo_ct2010||DRAFT_spatial._geo_cb2010 as geo_cb2010,
CENSUS_TRACT_BLOCK.fips||DRAFT_spatial._geo_ct2010 as geo_ct2010,
CENSUS_TRACT_BLOCK.bctcb2010,
CENSUS_TRACT_BLOCK.bct2010,
CENSUS_TRACT_BLOCK.fips||DRAFT_spatial._geo_ct2020||DRAFT_spatial._geo_cb2020 as geo_cb2020,
CENSUS_TRACT_BLOCK.fips||DRAFT_spatial._geo_ct2020 as geo_ct2020,
CENSUS_TRACT_BLOCK.bctcb2020,
CENSUS_TRACT_BLOCK.bct2020,
dcp_ct2010.ntacode as geo_nta2010,
dcp_ct2010.ntaname as geo_ntaname2010,
dcp_ct2020.nta2020 as geo_nta2020,
dcp_ct2020.ntaname as geo_ntaname2020,
dcp_ct2020.cdta2020 as geo_cdta2020,
dcp_ct2020.cdtaname as geo_cdtaname2020
INTO SPATIAL_devdb
FROM DRAFT_spatial
LEFT JOIN CENSUS_TRACT_BLOCK ON DRAFT_spatial.uid = CENSUS_TRACT_BLOCK.uid
LEFT JOIN dcp_ct2010 ON CENSUS_TRACT_BLOCK.bct2010 = boroct2010
LEFT JOIN dcp_ct2020 ON CENSUS_TRACT_BLOCK.bct2020 = boroct2020;
the log for github actions suggests that this failed at which indicates the aggregate table from the
sql/tract.sql
step has non unique values in thebct2020
column