ccao-data / model-sales-val

Heuristics for detecting outlier and non-arms-length sales
MIT License
2 stars 1 forks source link

Make flagging script more flexible with respect to geography #98

Closed jeancochrane closed 7 months ago

jeancochrane commented 8 months ago

Two new requirements that both require a more flexible approach to geography:

  1. We would like to be able to run a flagging job against sales within an arbitrary geography or set of geographies (e.g. flag sales within a tri, municipality, or group of neighborhoods)
  2. We would like to be able to group sales by arbitrary sets of geographies (e.g. a municipality or a group of neighborhoods)

I am assuming that we don't need to support multiple types of geography per run (e.g. we do need to allow a user to decide between running a flagging job on a tri or a municipality; but we don't need to allow a user to run a flagging job on both a tri and a municipality in the same run).

This will require changes to at least the following parts of the code:

Depending on the degree of changes required by the design for this feature, it may be worth sketching out a proposed solution in writing for approval before getting started on the implementation.

wagnerlmichael commented 8 months ago

(write-up in progress)

Assumption to confirm

While discussing potential data models with Jean, we identified an assumption which, if violated, would make the data model significantly more complicated for the user.

Assumption: The submarket geographies never intersect. Currently the submarkets don't violate this assumption. Each tri is confined to its own methodology - in the city tri the groupings are discrete neighborhood combinations and in the north and south tri the groupings are discrete townships. Here is an theoretical scenario that would violate this assumption: we when we develop new groupings for the north tri - we we decide to choose something like census tract, something that overlaps into other groupings (from city tri, for example).

Notes on complexity

Writing down some more thoughts to think through the complexity of the data model.

Scenario 1 - Mutually exclusive groupings

Scenario 2 - Mutually exclusive groupings in the recurring job. Non-mutually exclusive groupings for a manual update.

Scenario 3 - Non mutually exclusive constraint in any scenario