Open fvankrieken opened 5 months ago
As part of #831 , I've found a couple different instances
For a messy condo case, 4006197501 is a great example
Every non-primary row has distinct monetary values. Summed sq footage of all unit rows is 11684.
Currently, this bbl ends up with 11684 bldgarea (summed - seems wrong), 10569 resarea (summed - seems right), 256 unitsres (obviously wrong), 365 units total (obviously wrong).
So for this, it would seem like maybe we do the following
This doesn't capture the case of where 3 or 4 unit rows which each have same units as primary (so 1 primary row, 3 unit rows, all with 217 tot units and 216 res units - as of time of writing, 3070627501 is a good example). But maybe these we want to be checking manually as we currently are, because they seem more difficult to assume that they're in error
Using this largely as a place to take notes. This ended up being pretty rambling so I'll tidy up later. Very much a WIP issue
In pluto, unit information, lot/bldg value information, and bldgarea are calculated on an aggregate basis, based on primebbl. There are some odd cases - condos where each condo has the building's total condo_units as its condo_units field, rows that have the same primebbl where there are two rows, one seemingly as a primary condo row, the other as simply a single non-condo row.
Original motivation was to solve that first case, but in the process, it's become clear that it would be helpful to try to classify PTS rows.
Current approach
Currently, pluto_rpad_geo is used to sequentially create then update pluto_allocated. First, distinct primebbls (860610 of them) are inserted into the empty table. Then, they're updated by a few criteria to get their non-aggregate values
tl NOT LIKE '75%' AND condo_number IS NULL
tl LIKE '75%' AND condo_number <> '0'
One issue with our current strategy is that there are 6 primebbls that have rows that meet both these criteria. So while our code is deterministic, these primebbls are inserted without data into pluto_allocated, then updated with data based on their "standard" rows, then updated with data based on their "condo" rows.
There are also 2448 distinct primebbls where condo_number is not null or zero despite having a lot like 75xx. Most of these do not get geoms in pluto. Not sure if these should be getting geom info from condo data, or where.
Regardless, it seems like it would be useful to start from more of a mode of classifying these rows.
Potential new structure
It makes sense instead of doing this update logic, where we insert bbls and then attempt to find their primary rows, to start with the "primary row" logic. There are 860609 rows where
bbl = primebbl
, with primebbl being distinct in this table. This seems like a good place to start. We lose one primebbl (3047927502), but that seems to be an odd case. We also in this case are maybe assuming that the 6 bbls that met both clauses above should be treated as condos, this seems largely right. We also then are assuming that the 2448 "semi-condo"s should be getting information from pluto_rpad_geo. These can be classified based on their lot number as condos or not (other than the semi-condos), and we can go from thereWith that, we can continue to classify rows in PTS
These numbers do add up to 1152506, which is the total number of rows in PTS, so I didn't drop anything by accident.
Then, we can join primary condo rows to their subrows, and categorize what we see there a bit