Open edarsow opened 3 years ago
Discussed with SNAPPERs on zoom 29-DEC-2020! Py-ho!
I don't think my mic is working. These were some useful updates. Thank you!
Updated with shop1 example and human details with SNAPPER on 31 DEC 2020
Called with SNAPPERS: updated the following:
ALTER TABLE public.parcel ADD COLUMN muni_municode INTEGER NOT NULL CONSTRAINT parcel_muni_fk REFERENCES municipality (municode);
And rename to the linking table
ALTER TABLE mailingaddressparcel RENAME TO parcelmailingaddress;
Goal
Use data from WPRDC and County Scraping AND OR Zac's API to create in our database schema an accurate reflection of the following for a given municipality:
High level process
Interpret the spirit of these steps
Phase Nil: Strategize an approach that recognizes three distinct possible pathways for external data integration:
The tradeoff we're facing is: bring to full use the API Drew's pyparcel API designed with codeNforce in mind, although our first version of the database: centered on
property
andperson
tables or write new Python to pull from Zac's api and translate into our newhuman
andparcel
tables.Phase A: Base parcel list in
parcel
tableAcquire and create a master list of all parcels in a given municipality (by municode -- the ONLY primary key in All of CodeNForce that is assigned by an external entity --the county of Allegheny) and write those as individual records in the
parcel
tablePhase B: Populate
human
andmailingaddress
tables (and their linkers)human
tablemailingaddress
tableRequirements
human
table. SEE HUMAN DETAILS BELOWrole_humanmailingroleid
on thehumanmailingaddress
table). So it's quite possible that a human record would be linked to multiple addresses. If that human owns a parcel with multiple addresses (the range case from above), we would need appropriate records in the linking table to show the connection to each parcel's mailingaddress records.Humanity discussed
A record in the human table represents one of two related concepts:
Notes on the two meanings of a record in the human table
One might actually argue that we would want two tables: actualhumans and property owning entities by the county. The design choice to contain both notions of a human into the
human
table was driven by the reality that our key constituent in a municipality are property owners, and we want to be able to interact with them as we would humans type 1, meaning send them a letter, or talk to it/them/him/her on a telephone. To facilitate connecting events in codeNforce with legal property owners, we have collapsed both entities into the notion of a record in thehuman
table, with flags such asmultihuman
and 'businessentity` to help tease apart if a given record is human type 1 or human type 2.the
human
flagsNote the boolean
businessentity
flag which we'd like to try to set if there are obvious indicators likellc
in the owner name. The jury is still out on what to do with dual ownership of properties. For now, don't try to separate the individual humans inside such names. We might want to start a flag on human for suspected dual person owner names. There is also a fieldmultihuman
that we want to set to true if we have reason to believe the owner name in the county represents two actual humans, such as "MARIETTA JOHN AND JOSAPHINE". In this case, we will think of this as one human with one name for thename
field in thehuman
table, and we want to turn themultihuman
flag to true.Utility TODOs
There are two utility tables to categorize the links between parcels and humans and humans and mailingaddresses. Please make initial entries into these tables which will be keyed by the many:many tables
humanmailingaddress
andhumanparcel
so that we can classify a connection to be something like "owns" to describe why a human is linked to a parcel or "tenant of", etc. Entries in thehumanmailingaddress
table will have foreign keys to thehumanmailingrole
table to describe the nature of the connection between the human and the address.Design Notes
The humanization table schema separates parcels, humans, mailingaddresses, and phone numbers in separate tables with a rather complex key structure. Importantly, the
property
andperson
tables are now DEPRECATED and exist for archival and migration purposes. The notion of aProperty
only exists in the Java side, since in reality, the county only thinks in parcels. The notion of aPerson
also now only exists in Java, and is composed of one or more records from thehuman
table and their related 'mailingsaddresses' and 'contactphone' records.mailingaddress
notesAlso note that the mailingaddress table separates the street number from the street name. So for the address "2209 S. BRADDOCK AVE" we would write "2209" into the field
addressnum
and "S. BRADDOCK AVE" into thestreet
field. This will make scraping trickier, but hopefully regexps will help. This separation will allow for much more accurate retrievals and inserts, since one can look up the street first and then add a proper number to a known street.parcelmailingaddress
linking table notesWe connect a parcel with an address in the database by writing a record to the
parcelmailingaddress
table. Links have sources, so the FK to thebobsource
table calledsource_sourceid
tracks where that address came from. The county portal's field labels should become their own sources in thebobsource
table.Example scrape
Given the property of shop1, county parcel ID of "0178-H-00033-0000-00"
We see that the property address is "2209 S BRADDOCK AVE PITTSBURGH, PA 15218". We already have the record in the
parcel
table (since that's how we got the profile) so we now need to get this address written to the appropriate fields in themailingaddress
table before we link.Then we link between the record in the
parcel
table with a value in the fieldparcelidcnty
of "0178-H-00033-0000-00" and the mailing address by creating a record in theparcelmailingaddress
table.That
parcelmailingaddress
table should have FK to a record in thebobsource
table whosename
field has the value something like "Property Address on ACREP" (Allegheny County Real Estate Portal). The FK field in theparcelmailingaddress
table issource_sourceid
.To continue the example, we would find this info at the bottom of the profile
We want the database to encode that the record in the
human
table whosename
is "DARSOW ERIC" to be connected to the SAME record in themailingaddress
table as is the record in theparcel
table with aparcelidcnty
value of "0178-H-00033-0000-00"So the script would need determine that the value on the county profile page for "Owner Mailing" is the SAME as the value on the county profile page for "Property Address" and make TWO records in linking tables to the SAME record in the
mailingaddress
table.Finally, we check the tax tab and see this: We use script logic to see that the address contains a name that's already in our
human
table, and an address that's already in ourmailingaddress
table. We're not done! We want to show that this is, in fact, the tax mailing address for a parcel whoseparcelidcnty
is "0178-H-00033-0000-00", and we'll do so by making a record in theparcelmailingaddress
table whosesource_sourceid
is a record in thebobsource
table with a name like "Tax Bill Mailing Address from ACREP".STILL NOT DONE
We'd also like to know that "DARSOW ERIC" is associated with a tax mailing address of "2209 S BRADDOCK AVE PITTSBURGH PA 15218" so let's make a final record in the
humanmailingaddress
table that has a FK to a record in thehumanmailingrole
table of something like "Tax Mailing Address"The frontier
Successfully accomplishing this mission would, I believe, create the most comprehensive and accurately liked property-human-mailing database in the county.
Process ideas
Work by hand (i.e. code the SQL directly) for the range of cases your script is trying to cope with. Use that SQL to create your automated SQL assembly functions in Python. This will familiarize yourself with the new schema and probably reveal changes that need to be made.
Keep and eye out for improvements and bugs and submit them as Pull Request for SQL patch inclusions.