Based on the principles laid out in the Humanization scraping issue, write SQL/PLSQL/Python to read data from the property and person tables into the appropriate fields inside human, parcel, and mailingaddress tables, and most importantly, their linking tables.
Design notes:
Mailing address separates out building number from street name. The current fields on property are NOT delimited like this. You'll probably want to run some test splices of these to check for appropriate slicing. Be very cautious of the 3 types of spaces that could be found between a building number and street. Using the regexp \W (non word characters) character class has worked in the past, but watch it like a hawk for odd addresses.
Sample regexp to start the spicing (note the use of named match groups and note that the \\ before character groups is a Java thing, and python doesn't need to escape the \ if you use raw strings, e.g. r"i am raw python string")
Pattern pat = Pattern.compile("(?<num>\\d+[a-zA-Z]*)\\W+(?<street>\\w.*)");
Pattern patStreet = Pattern.compile("\\s([a-zA-Z0-9][a-zA-Z_\\s.]*)");
Looking at the parcel table: Note that the field parcelkey is our internal DB primary key that can be any unique value, including the value used as the propertyid in the legacy table. The field parcelidcnty should hold the official parcel ID in the county database. Now lotandblock is officially derived from the parcel id from the county. Eric isn't sure that this should really be its own db field--since these should not disagree ever--but sometimes do in the legacy property table.
Option 1 is to try to write a script that will generate the lot and block from the parcel ID, and ignore whatever is in the legacy property table's lotandblock field.
Option 2 is to check the two legacy fields and only rebuild the lot and block if the legacy disagrees.
Option 3 is to recommend that the new parcel table not even have the lotandblock field and instead derive it each time we use it on the java side
Many tables will have a filed called source_sourceid this is keyed to the table bobsource and bobsource can have any arbitrary records to convey how a record entered the database. You might want to create your own schema in bobsource in case you have different data pathways in, such as a scrape versus Zac's API for example.
Many tables have a createdby_userid field. Sylvia, the cogbot, has an ID in test and production databases with a userID of 99. You're welcome to make your own robot with an ID less than 99. Just include an insert into login so your script works on the production system.
Based on the principles laid out in the Humanization scraping issue, write SQL/PLSQL/Python to read data from the property and person tables into the appropriate fields inside human, parcel, and mailingaddress tables, and most importantly, their linking tables.
Design notes:
Mailing address separates out building number from street name. The current fields on
property
are NOT delimited like this. You'll probably want to run some test splices of these to check for appropriate slicing. Be very cautious of the 3 types of spaces that could be found between a building number and street. Using the regexp\W
(non word characters) character class has worked in the past, but watch it like a hawk for odd addresses. Sample regexp to start the spicing (note the use of named match groups and note that the\\
before character groups is a Java thing, and python doesn't need to escape the \ if you use raw strings, e.g.r"i am raw python string"
)Looking at the
parcel
table: Note that the fieldparcelkey
is our internal DB primary key that can be any unique value, including the value used as the propertyid in the legacy table. The fieldparcelidcnty
should hold the official parcel ID in the county database. Nowlotandblock
is officially derived from the parcel id from the county. Eric isn't sure that this should really be its own db field--since these should not disagree ever--but sometimes do in the legacyproperty
table. Option 1 is to try to write a script that will generate the lot and block from the parcel ID, and ignore whatever is in the legacyproperty
table's lotandblock field. Option 2 is to check the two legacy fields and only rebuild the lot and block if the legacy disagrees. Option 3 is to recommend that the newparcel
table not even have thelotandblock
field and instead derive it each time we use it on the java sideMany tables will have a filed called
source_sourceid
this is keyed to the tablebobsource
andbobsource
can have any arbitrary records to convey how a record entered the database. You might want to create your own schema inbobsource
in case you have different data pathways in, such as a scrape versus Zac's API for example.Many tables have a
createdby_userid
field. Sylvia, the cogbot, has an ID in test and production databases with a userID of 99. You're welcome to make your own robot with an ID less than 99. Just include an insert into login so your script works on the production system.