Closed fvankrieken closed 5 months ago
I've verified that source data versions are consistent for all that should be - only zoning data sources are updated from 23v3 to 23v3.1
Data available to be investigated right now
I've done a bit of digging which I'll try to summarize in a bit, but as a bit of a first step am going to rebuild pluto with identical inputs to 23v3. We'll first need to compare outputs to 23v3, identify anything that is truly nondeterministic. Then we can start comparing intermediate tables to 23v3.1 and try to figure out where issues are arising from, whether they are from semi-nondeterministic code (akin to condo proxcode issue) or "intended" behavior - some sort of filtering at some step that new zoning data rightfully changes for a certain lot, etc
Resolved 3002787502 morphing into 3002787501. These are billing/primebbls which share a base_billing_bbl (in dtm_condo). dtm only has one polygon for the basebbl and nothing smaller that could correspond to 7501 and 7502. @AmandaDoyle thinks this is likely a dtm issue, that hopefully it will "catch up" with reality at some point soon.
Nothing should be done to code, since multiple base bbls per billing is not valid. However, it might be worth adding code to at least make this deterministic (grab smallest primebbl associated with basebbl).
Logic around this is in dtmmergepolygons.sql
24 of these are condo base bbls that shouldn't have been in 23v3 to begin with. As of this moment, unsure how they ended up in it
condo_base_bbl | condo_billing_bbl |
---|---|
1007380008 | 1007387505 |
1013460012 | 1013467504 |
1015700017 | 1015707501 |
1021520083 | 1021527501 |
2025620049 | 2025627501 |
2027240032 | 2027247503 |
2059060342 | 2059067502 |
3001190055 | 3001197501 |
3004350047 | 3004357506 |
3009340069 | 3009347513 |
3010400013 | 3010407502 |
3011120012 | 3011127504 |
3018400001 | 3018407502 |
3023660032 | 3023667502 |
3025200001 | 3025207501 |
3048140026 | 3048147506 |
3049940013 | 3049947501 |
3056210005 | 3056217510 |
3056820045 | 3056827502 |
3063250030 | 3063257502 |
3064110022 | 3064117504 |
3067670090 | 3067677502 |
4009100001 | 4009107502 |
5038320021 | 5038327501 |
Simpler issue than I thought, at least in terms of why the source data seems different. See timestamps
If it looks like a lot change, it's probably a lot change.
We recently discussed that when pushing draft outputs, we should be cleaning (deleting) any existing files in s3 prior to pushing. This was an insidious case of a silent failure in the October build on @sf-dcp 's branch, where the gdb export failed. set e
either isn't working as we intended within the function call (we need to maybe explicitly capture the gdal error and then exit), or gdal itself failed silently. But outputs were made MINUS the gdbs, which were then pushed to s3. Old outputs overwritten except for the gdbs, which remained
@jackrosacker flagged on teams earlier this week
@AmandaDoyle fyi
23v3 being pulled from published version 23v3, 23v3.1 from draft
fvk-pluto-versioning
. These record counts are using the outputmappluto_unclipped_gdb.gdb.zip
A couple specific bbls - from @jackrosacker querying data loaded to the gis sde, the following bbls were present in 23v3 but not in 23v3.1
1001240008 1007380008 1013460012 1014290003 1014290044 1015700017 1021520083 2025620049 2027240032 2029460047 2030920003 2031430282 2044270013 2044270014 2044270015 2044270016 2059060342 3001190055 3002787502 3004350047 3009340069 3010400013 3011120012 3018400001 3023660032 3025200001 3036870020 3036970033 3048140026 3049940013 3056210005 3056820045 3063250030 3064110022 3067670090 4004300008 4004300010 4004300011 4004300012 4009100001 4010780051 4050667502 4050667505 4076217501 5010080049 5014957501 5024017501 5024507504 5030030025 5030247501 5031737504 5038320021 5065800056 5065800058 5070557502 5079390056 5080400021
A specific case, there's a polygon that seems to be consistent yet change bbl from 3002787502 to 3002787501