Make flagging script more flexible with respect to geography

Two new requirements that both require a more flexible approach to geography:

We would like to be able to run a flagging job against sales within an arbitrary geography or set of geographies (e.g. flag sales within a tri, municipality, or group of neighborhoods)
We would like to be able to group sales by arbitrary sets of geographies (e.g. a municipality or a group of neighborhoods)

I am assuming that we don't need to support multiple types of geography per run (e.g. we do need to allow a user to decide between running a flagging job on a tri or a municipality; but we don't need to allow a user to run a flagging job on both a tri and a municipality in the same run).

This will require changes to at least the following parts of the code:

The query in the flagging script(s)
The config parameters in input.yaml
The parameter output table

Depending on the degree of changes required by the design for this feature, it may be worth sketching out a proposed solution in writing for approval before getting started on the implementation.

(write-up in progress)

Assumption to confirm

While discussing potential data models with Jean, we identified an assumption which, if violated, would make the data model significantly more complicated for the user.

Assumption: The submarket geographies never intersect. Currently the submarkets don't violate this assumption. Each tri is confined to its own methodology - in the city tri the groupings are discrete neighborhood combinations and in the north and south tri the groupings are discrete townships. Here is an theoretical scenario that would violate this assumption: we when we develop new groupings for the north tri - we we decide to choose something like census tract, something that overlaps into other groupings (from city tri, for example).

Notes on complexity

Writing down some more thoughts to think through the complexity of the data model.

Scenario 1 - Mutually exclusive groupings

This situation is outlined above.

Scenario 2 - Mutually exclusive groupings in the recurring job. Non-mutually exclusive groupings for a manual update.

Let's say city tri has mutually exclusive neighborhood groups, but north tri has census tract groups that overlap into city tri.
- We could define a configuration which adds new groupings that are census tract for North tri. We could run this as a one off to update the sales within these census tracts. However - do we also flag the sales that overlap into the city tri that belong in north tri census tract geographies?

Scenario 3 - Non mutually exclusive constraint in any scenario

In any situation where the groups aren't mutually exclusive, there will also need to be a workaround to get the recurring job to work. If there is a sale which is present in two different groups (census track and neighborhood, for example) in two different flagging methodologies - we will need to build in something to select which methodology the flag sale takes.

ccao-data / model-sales-val

Make flagging script more flexible with respect to geography #98

Assumption to confirm

Notes on complexity