TechnologyRediscovery / codenforce

municipal code enforcement database application
GNU General Public License v3.0
2 stars 3 forks source link

PyParcel Updates: Jan 2020 #181

Open edarsow opened 3 years ago

edarsow commented 3 years ago

Goal

Use data from WPRDC and County Scraping AND OR Zac's API to create in our database schema an accurate reflection of the following for a given municipality:

High level process

Interpret the spirit of these steps

Phase Nil: Strategize an approach that recognizes three distinct possible pathways for external data integration:

  1. The original migration script from MS Access to Postgres, written by Daniel, then muddled up by Eric. Matt Whacked.
  2. Drew created an entire API for external data in Python before we knew Zac existed
  3. Zac came along and showed us an API that already does all this, but was designed without codeNforce database in mind

The tradeoff we're facing is: bring to full use the API Drew's pyparcel API designed with codeNforce in mind, although our first version of the database: centered on property and person tables or write new Python to pull from Zac's api and translate into our new human and parcel tables.

Phase A: Base parcel list in parcel table

Acquire and create a master list of all parcels in a given municipality (by municode -- the ONLY primary key in All of CodeNForce that is assigned by an external entity --the county of Allegheny) and write those as individual records in the parcel table

Phase B: Populate human and mailingaddress tables (and their linkers)

  1. Scrape or read from API the owner name associated with each parcel in the master list from Phase A. Write the owner name into human table
  2. Scrape or API fetch the one or more mailing addresses associated with your active parcel and write those to the mailingaddress table
  3. Create appropriate records in linking tables to connect humans and mailingaddresses

Requirements

  1. Parcel/MailingAddresses: Appropriately create records to reflect parcel edge cases in the county database whose official address is a sequence of building numbers on a given street or streets. Example: Punta gorda street (Chalfant probably or east mckeesport) has a parcel with address over a range 1200-1222, so we would need to make a single record in parcel and then generate with logic the internal addresses between the end points supplied by the county (1202, 1204, etc. to 1222).
  2. Humans: Digest parcel records from the county, extract owner name, and create a corresponding record in the human table. SEE HUMAN DETAILS BELOW
  3. Link humans, addresses, and parcels: Once we have a human record that reflects an owner of a parcel, we then need to make appropriate linkages connect the humans, their addresses, and the parcels they own.
  4. Cope with edge cases of owners whose mailing address AND/OR tax mailing address is different from the address of the parcel they own. (Remember there are three possible addresses in any given county record: the parcel's address, the owner mailing, and the tax address). In this case, we want to write the relationship between the human, the parcel, and the parcel's one or more mailing addresses in the linking tables with appropriate classifications of those linkages (i.e. the value of the field role_humanmailingroleid on the humanmailingaddress table). So it's quite possible that a human record would be linked to multiple addresses. If that human owns a parcel with multiple addresses (the range case from above), we would need appropriate records in the linking table to show the connection to each parcel's mailingaddress records.

Humanity discussed

A record in the human table represents one of two related concepts:

  1. An actual human lurking about in the world
  2. An owner of a property in the Allegheny County database, which is to say, a unique value in the field of "Owner Name" on a parcel's real estate portal profile page.

Notes on the two meanings of a record in the human table

One might actually argue that we would want two tables: actualhumans and property owning entities by the county. The design choice to contain both notions of a human into the human table was driven by the reality that our key constituent in a municipality are property owners, and we want to be able to interact with them as we would humans type 1, meaning send them a letter, or talk to it/them/him/her on a telephone. To facilitate connecting events in codeNforce with legal property owners, we have collapsed both entities into the notion of a record in the human table, with flags such as multihuman and 'businessentity` to help tease apart if a given record is human type 1 or human type 2.

the human flags

Note the boolean businessentity flag which we'd like to try to set if there are obvious indicators like llc in the owner name. The jury is still out on what to do with dual ownership of properties. For now, don't try to separate the individual humans inside such names. We might want to start a flag on human for suspected dual person owner names. There is also a field multihuman that we want to set to true if we have reason to believe the owner name in the county represents two actual humans, such as "MARIETTA JOHN AND JOSAPHINE". In this case, we will think of this as one human with one name for the name field in the human table, and we want to turn the multihuman flag to true.

Utility TODOs

There are two utility tables to categorize the links between parcels and humans and humans and mailingaddresses. Please make initial entries into these tables which will be keyed by the many:many tables humanmailingaddress and humanparcel so that we can classify a connection to be something like "owns" to describe why a human is linked to a parcel or "tenant of", etc. Entries in the humanmailingaddress table will have foreign keys to the humanmailingrole table to describe the nature of the connection between the human and the address.

Design Notes

The humanization table schema separates parcels, humans, mailingaddresses, and phone numbers in separate tables with a rather complex key structure. Importantly, the property and person tables are now DEPRECATED and exist for archival and migration purposes. The notion of a Property only exists in the Java side, since in reality, the county only thinks in parcels. The notion of a Person also now only exists in Java, and is composed of one or more records from the human table and their related 'mailingsaddresses' and 'contactphone' records.

mailingaddress notes

Also note that the mailingaddress table separates the street number from the street name. So for the address "2209 S. BRADDOCK AVE" we would write "2209" into the field addressnum and "S. BRADDOCK AVE" into the street field. This will make scraping trickier, but hopefully regexps will help. This separation will allow for much more accurate retrievals and inserts, since one can look up the street first and then add a proper number to a known street.

parcelmailingaddress linking table notes

We connect a parcel with an address in the database by writing a record to the parcelmailingaddress table. Links have sources, so the FK to the bobsource table called source_sourceid tracks where that address came from. The county portal's field labels should become their own sources in the bobsource table.

Example scrape

Given the property of shop1, county parcel ID of "0178-H-00033-0000-00"

image

We see that the property address is "2209 S BRADDOCK AVE PITTSBURGH, PA 15218". We already have the record in the parcel table (since that's how we got the profile) so we now need to get this address written to the appropriate fields in the mailingaddress table before we link.

Then we link between the record in the parcel table with a value in the field parcelidcnty of "0178-H-00033-0000-00" and the mailing address by creating a record in the parcelmailingaddress table.

That parcelmailingaddress table should have FK to a record in the bobsource table whose name field has the value something like "Property Address on ACREP" (Allegheny County Real Estate Portal). The FK field in the parcelmailingaddress table is source_sourceid.

To continue the example, we would find this info at the bottom of the profile image

We want the database to encode that the record in the human table whose name is "DARSOW ERIC" to be connected to the SAME record in the mailingaddress table as is the record in the parcel table with a parcelidcnty value of "0178-H-00033-0000-00"

So the script would need determine that the value on the county profile page for "Owner Mailing" is the SAME as the value on the county profile page for "Property Address" and make TWO records in linking tables to the SAME record in the mailingaddress table.

Finally, we check the tax tab and see this: image We use script logic to see that the address contains a name that's already in our human table, and an address that's already in our mailingaddress table. We're not done! We want to show that this is, in fact, the tax mailing address for a parcel whose parcelidcnty is "0178-H-00033-0000-00", and we'll do so by making a record in the parcelmailingaddress table whose source_sourceid is a record in the bobsource table with a name like "Tax Bill Mailing Address from ACREP".

STILL NOT DONE

We'd also like to know that "DARSOW ERIC" is associated with a tax mailing address of "2209 S BRADDOCK AVE PITTSBURGH PA 15218" so let's make a final record in the humanmailingaddress table that has a FK to a record in the humanmailingrole table of something like "Tax Mailing Address"

The frontier

Successfully accomplishing this mission would, I believe, create the most comprehensive and accurately liked property-human-mailing database in the county.

Process ideas

Work by hand (i.e. code the SQL directly) for the range of cases your script is trying to cope with. Use that SQL to create your automated SQL assembly functions in Python. This will familiarize yourself with the new schema and probably reveal changes that need to be made.

Keep and eye out for improvements and bugs and submit them as Pull Request for SQL patch inclusions.

edarsow commented 3 years ago

Discussed with SNAPPERs on zoom 29-DEC-2020! Py-ho!

snapperVibes commented 3 years ago

I don't think my mic is working. These were some useful updates. Thank you!

edarsow commented 3 years ago

Updated with shop1 example and human details with SNAPPER on 31 DEC 2020

edarsow commented 3 years ago

Called with SNAPPERS: updated the following:

ALTER TABLE public.parcel ADD COLUMN muni_municode INTEGER NOT NULL CONSTRAINT parcel_muni_fk REFERENCES municipality (municode);

And rename to the linking table

ALTER TABLE mailingaddressparcel RENAME TO parcelmailingaddress;