DemocracyClub / EveryElection

:ballot_box_with_check: For recording every election in the UK
https://elections.democracyclub.org.uk/
BSD 3-Clause "New" or "Revised" License
11 stars 14 forks source link

Remove `ORG_CURIE_TO_MAPIT_AREA_TYPE` mapping #2246

Open chris48s opened 2 weeks ago

chris48s commented 2 weeks ago

We have a script called import_divisionsets_from_csv.py which imports divisionsets from a csv

https://github.com/DemocracyClub/EveryElection/blob/master/every_election/apps/organisations/management/commands/import_divisionsets_from_csv.py

Our CSV contains a 3-letter local authority code. One of the things this script does is it attempts to look up our area in this big dict https://github.com/DemocracyClub/EveryElection/blob/ef22b5a58b66a9a1a96fe5ace4f831eb7de40bf0/every_election/apps/organisations/constants.py#L197 to work out what sort of organisation it is.

The reason we want to know this is so that we can then look that up in https://github.com/DemocracyClub/EveryElection/blob/ef22b5a58b66a9a1a96fe5ace4f831eb7de40bf0/every_election/apps/organisations/constants.py#L1-L11 to work out what sort of divisions we're importing.

There are 2 problems with this:

  1. Because our organisations are primarily defined in the DB but this lookup is defined in code, every time we add an organisation to our DB, we also have to edit the code. This is a bit silly.
  2. In the case where the organisation is a UTA, its sub-divisions can be either UTW or UTE. We have no way to know which, so we just pick UTW The most common case is UTW but this means we are sometimes wrong. There are probably a handful of areas where we have assigned divisions UTW when they should be UTE. This will probably surface when we look at back-porting GSS codes onto areas and we'll need to fix it.

I suggest that we should move to storing this in the DB. We need to define 2 fields. One to store the area code for the organisation itself, and another to store the area code that child divisions of this organisation have (lets call them boundaryline_area_code and boundaryline_children_area_code). I think you could make an argument for either putting this on the Organisation model itself or on the OrganisationGeography. Both fields need to allow blank values because not every single Organisation object will have one of these codes. Most will but not all. To avoid duplication, I think we can also say we should only populate boundaryline_children_area_code when boundaryline_area_code is UTA (i.e: there is not a one-to-one mapping) and just defer to looking the others up based on a mapping:

PARENT_TO_CHILD_AREAS = {
    "DIS": "DIW",
    "MTD": "MTW",
    "CTY": "CED",
    "LBO": "LBW",
    "CED": "CPC",
    "NIA": "NIE",
    "COI": "COP",
    "LGD": "LGE",
}

So then this job is going to break down into the following tasks: