TechnologyRediscovery / codenforce

municipal code enforcement database application
GNU General Public License v3.0
2 stars 3 forks source link

SNAPPERS RE-Onboarding #216

Open edarsow opened 2 years ago

edarsow commented 2 years ago

Welcome back snaps!

Step 1: Rebuild database. The dump file is in the google drive root/database/local_toSnappers_7-MAR-2022.sql. Suggested step: build an empty database as postgres superuser Then use pg_restore to slurp up the dump file into the new shell. Something like pg_restore -d newdb local_toSnappers_7-MAR-2022.sql

Step 2: Study the new human and address schema based on the google drive diagram and the new database tables. Notice that the address tables are all prefixed with mailing.

Step 3: Begin to design code that can do the following:

  1. Take an address string like '348 Chicora St, E. Mckeesport PA 15035' and query the database tables to see if this address exists. If it does, return the PK of the building number, which represents a structure on a street.
  2. If the address doesn't exist, the code needs to start by asking: does the street exist, but not the building? If so, use the existing street record and just write a new building number, keyed to that street. If neither the building nor street exist, write the street, link to appropriate zip, then write the building, keyed to that new street.

Step 4: Begin to consider street/building cleaning code that can detect duplicate streets, collapse those streets, re-keying buildings that were once keyed to a duplicate street to the new primary record

snapperVibes commented 2 years ago

217

I couldn't get the local copy working with the .sql file you gave me, but I used an older .tar file and got everything running.

This is a work-in-progress state of the code to make sure I am on the correct path. There was some things I was unclear on.

To start, parsing an address string is really, really difficult. Why, exactly, would we be wanting to parse an address string? Are these strings coming from internal requests or will they be from user input? On that note, where do I find data for whether a street exists but not the building. Do you mean plug the street's name into the mailingstreet table and hope that it has a unique name? Where does the parcel-address api fit into all of this?

TLDR: I don't understand what purpose my code is serving and believe that it will veer off-topic without further guidance.

edarsow commented 2 years ago

Here is a sample reply from GAZE api from a single PARCEL ID:

{
  "results": {
    "mailing": {
      "original": "745 PUNTA GORDA AVE \nEAST MC KEESPORT , PA 15035-1260",
      "cleaned": "745 PUNTA GORDA AVE EAST MC KEESPORT, PA 15035-1260",
      "parsed": {
        "number": "745",
        "street": "PUNTA GORDA",
        "type": "Ave",
        "suffix": "E",
        "city": "MC KEESPORT",
        "state": "PA",
        "zip": "15035",
        "plus4": "1260"
      }
    },
    "mortgage": {
      "original": "MEGAHAN EDWARD J JR\n745 PUNTA GORDA ST\nEAST MC KEESPORT PA\n15035-1260\n",
      "cleaned": "745 PUNTA GORDA ST EAST MC KEESPORT PA 15035-1260",
      "parsed": {
        "number": "745",
        "street": "PUNTA GORDA",
        "type": "St",
        "suffix": "E",
        "city": "MC KEESPORT",
        "state": "PA",
        "zip": "15035",
        "plus4": "1260"
      }
    }
  }
}
edarsow commented 2 years ago

Project 1: Your script is run when a user clicks a button called "synchronize owner and mailing with county records" which will be located on a property profile page inside codeNforce.

This button would engaged with your code which will do the following:

  1. You'll be given a parcel ID only from Javaland
  2. Call Zac's Owner info API endpoint and get back the JSON like we see above.
  3. Undertake an interrogation of the current state of the human and mailingaddress tables to determine if the addresses returned from the Gaze call exist in the database and if the person/humans in the call exist in the database already.
  4. Your logic branches here: If your script believes the address exists, but is NOT linked to the parcel, your script should write that link in the parcelmailingaddress table so when the property profile is refreshed in Javaland, the existing address is linked to the parcel under examination.
  5. If your script believes the address(s) do not exist, then your script will need to undertake the writing process, which should probably go like 1) look up zip in mailingcitystatezip table, get the ID of the ZIP, then look in mailingstreet for an existing street, if street doesn't exist, make one, grab its id, then link the street and zip. Finally, check for building number in mailingaddress (remembering that each mailingaddress record is keyed to one and only one street, and each street is keyed to one and only one zip).
  6. Once that new address is in place in our three address tables, then link that mailingaddress record's ID to the parcel ID for viewing by writing a full record to the linking table parcelmailingadress, using sylvia's userid 99 for the creator and lastupdator, and the appropriate linked object role PK

Humans

We want to digest the data from the Gaze API and extract the full owner name. We then check the human table for a human with exactly (or nearly) that name. If we think they are the same, then we just want to like the address to that human with the appropriate linked object role in the humanmailingaddress linking table.

If the person name from the county scrape is believed NOT to be in the human table, then we need to write that record, then key the addresses to the human.

Misc todos

edarsow commented 2 years ago

Next sub-project: dup collapsing

As you found, multiple records in mailingaddress are the same building number and same street FK. We'll want to find these, and collapse them.

Human collapsing

Many humans have the same name, since persons had the same name and humans came from old persons. This is trickier because humans have all sorts of objects linked to them (addresses, phones, emails, parcels, events, etc.). So when you're ready for slog, we'll want to collapse and rewire human dups too.

snapperVibes commented 2 years ago

​I have a couple scenarios I wanted to ask for your advice on.

Mismatched data between Gaze and the CogDB

Edit: See explanatory comment

# Absolutely different address
PARCEL ID: 0547J00131000000
GAZE:      304 STATION ST, WILMERDING PA
COG:       212 CONGRESS ST, EAST MC KEESPORT PA

# Wrong type (and wrong city, but that's not what I want to focus on in this example)
PARCEL ID: 0546M00229000000
GAZE:      569 HELENA AVE, MC KEESPORT PA
COG:       569 HELENA ST, EAST MC KEESPORT PA

# Alternate city
PARCEL ID: 0546M00221000000
GAZE:      561 HELENA ST, MC KEESPORT PA
COG:       561 HELENA ST, EAST MC KEESPORT PA

Cities that don't appear in mailingcitystatezip

I understand it is a third party database and I doubt I am supposed to write to it.

To reproduce:

SELECT * FROM mailingcitystatezip WHERE city='VERSAILLES' AND state_abbr='PA';
-- no results
edarsow commented 2 years ago

Here is the FK for parcelmailingaddress's addressid field

ALTER TABLE public.parcelmailingaddress ADD CONSTRAINT parcelmailingaddress_mailingaddressid_fk
    FOREIGN KEY (mailingparcel_mailingid) REFERENCES mailingaddress (addressid);

ALTER TABLE public.parcelmailingaddress RENAME COLUMN mailingparcel_parcelid TO parcel_parcelkey;
ALTER TABLE public.parcelmailingaddress RENAME COLUMN mailingparcel_mailingid TO mailingaddress_addressid;
snapperVibes commented 2 years ago

A quick note regarding mismatched data: The data was not mismatched. I was using the Cog's data with the linkedobjectrole of "ParcelMailingaddress", but should have been comparing it to "OwnerMailing" or "MortageMailing" (which I will add later). Resolving discrepancies between the db and gaze is still important, but the wild inconsistencies will no longer be there.