TechnologyRediscovery / codenforce

municipal code enforcement database application
GNU General Public License v3.0
2 stars 3 forks source link

Migrate data in the property and person tables into parcel, human, mailingaddress tables #198

Closed edarsow closed 2 years ago

edarsow commented 3 years ago

Based on the principles laid out in the Humanization scraping issue, write SQL/PLSQL/Python to read data from the property and person tables into the appropriate fields inside human, parcel, and mailingaddress tables, and most importantly, their linking tables.

Design notes:

  1. Mailing address separates out building number from street name. The current fields on property are NOT delimited like this. You'll probably want to run some test splices of these to check for appropriate slicing. Be very cautious of the 3 types of spaces that could be found between a building number and street. Using the regexp \W (non word characters) character class has worked in the past, but watch it like a hawk for odd addresses. Sample regexp to start the spicing (note the use of named match groups and note that the \\ before character groups is a Java thing, and python doesn't need to escape the \ if you use raw strings, e.g. r"i am raw python string")

    Pattern pat = Pattern.compile("(?<num>\\d+[a-zA-Z]*)\\W+(?<street>\\w.*)");
    Pattern patStreet = Pattern.compile("\\s([a-zA-Z0-9][a-zA-Z_\\s.]*)");
  2. Looking at the parcel table: Note that the field parcelkey is our internal DB primary key that can be any unique value, including the value used as the propertyid in the legacy table. The field parcelidcnty should hold the official parcel ID in the county database. Now lotandblock is officially derived from the parcel id from the county. Eric isn't sure that this should really be its own db field--since these should not disagree ever--but sometimes do in the legacy property table. Option 1 is to try to write a script that will generate the lot and block from the parcel ID, and ignore whatever is in the legacy property table's lotandblock field. Option 2 is to check the two legacy fields and only rebuild the lot and block if the legacy disagrees. Option 3 is to recommend that the new parcel table not even have the lotandblock field and instead derive it each time we use it on the java side

  3. Many tables will have a filed called source_sourceid this is keyed to the table bobsource and bobsource can have any arbitrary records to convey how a record entered the database. You might want to create your own schema in bobsource in case you have different data pathways in, such as a scrape versus Zac's API for example.

  4. Many tables have a createdby_userid field. Sylvia, the cogbot, has an ID in test and production databases with a userID of 99. You're welcome to make your own robot with an ID less than 99. Just include an insert into login so your script works on the production system.

edarsow commented 3 years ago

WWALK and ECD conferenced on this via zoom on 22-JUN'21

edarsow commented 3 years ago

ECD undertaking task; ETC middle of this week.

edarsow commented 2 years ago

done!