GreenInfo-Network / caliparks.org

Mapping social media in parks and other open spaces
http://www.caliparks.org/
22 stars 3 forks source link

CPAD update #656

Closed danrademacher closed 6 years ago

danrademacher commented 7 years ago

We need to do it. Will work on figuring out how. Reaching out to @mojodna on Slack.

danrademacher commented 7 years ago

Major issue is migrating Super Unit IDs

danrademacher commented 7 years ago

OK, Seth's generally on board to help with this some time in next 6 weeks. Will work out timing after I'm back in town (off most of next week)

danrademacher commented 7 years ago

Notes from @mojodna based on work last night (pulled from Slack DMs — forgot we are all on this repo!)


Superunits Added should be split into: Superunits Added and Superunits Renumbered

why do superunits get renumbered when their name, manager, or access changed (but not split or merged)? that seems like it’s effectively the same as having the geometry modified.

Superunits added - update data’s foreign keys using a spatial join

Superunits Renumbered - explicit update per renumbered superunit

Superunits changed - update data’s foreign keys using a spatial join; optionally delete where su_id IS NULL

Superunits Merged - explicit update per merged superunit

Superunits Deleted - explicit update (to NULL) or delete per removed superunit

(i’m not convinced that the UPDATEs using spatial joins are correct; i don’t have enough free space on my laptop to load the caliparks DB to test today)

(see the filters for split criteria, though it’s pretty obvious)

template SQL is in column A in each sheet

i’d call this tractable, provided that spatial joins are possible (geometries available in joined data)

i think i’m realizing that the explicit updates aren’t actually necessary if/when everything is updated using a spatial join (since that’ll pick up changes)

it does mean that entire datasets need to be re-joined, which could be an expensive process for large datasets

so the Superunits Added SQL could be constrained to joining just against the geometries of the new superunits using multiple queries (1 per new geom or a WHERE IN clause)

and Superunits Changed constrained similarly (or if it doesn’t make sense to join against the whole dataset again, even against a smaller number of superunit geometries, set to delete if the geometry has contracted (and never to expand))

i.e. use foreign key (su_id) to constrain the list of features to double-check whether they’re still contained by the updated geometries

to properly support that, it would help if Superunits Changed broke “Geometry” into “expansion” (new geometry fully contains old geometry), “contraction” (new geometry is fully contained by old geometry), and “changed” (new geometry intersects old geometry)

guidelines / considerations for CPAD consumers:

1) how large is the dataset linked to CPAD? (is re-joining a possibility?)

2) is data not contained by geometries in previous versions of CPAD kept around (e.g., CaliParks: no)? (is it worth joining against added / expanded superunits?)

3) does linked dataset contain geometries originally used to join against CPAD? (are spatial joins possible, or are renumbering + merging the only feasible operations?)

danrademacher commented 7 years ago

@mojodna interesting to consider the possibility of spatial joins as obviating the need for IDs in many cases. Back to the future!

huh, this is interesting:

to properly support that, it would help if Superunits Changed broke “Geometry” into “expansion” (new geometry fully contains old geometry), “contraction” (new geometry is fully contained by old geometry), and “changed” (new geometry intersects old geometry)

Not sure how feasible that is within current overall structure.

gregallensworth commented 7 years ago

I have updated the CPAD changelog generator, to accommodate this. The Superunits Changed and Units Changed worksheets, will now report acres and geometry field changes separately (it was intended to report park, fieldname, changedetails for each change noted).

The geometry change-finder detects old-contains-new and new-contains-old conditions and reports them as such, and will report "changed" if the geometry change fits neither straightforward case.

gregallensworth commented 6 years ago

I merged in PR 661 a few days ago. Thank you very much, mojodna.

I came up with a clear and now documented, backup and restore procedure for the database, as well as testing it extensively on my development VM here.

I created a new DB server and loaded the "cpad 2017 version" of the database to it. After confirming that the upgraded database was indeed powering the application just fine, I changed the DATABASE_URL setting on Caliparks and the site is now using CPAD 2017.

mojodna commented 6 years ago

🎉