This PR is a follow on to #40 to add support for GeoJSON exports of the university summary. I also want to chat a bit about the approach used here to ensure we're all ok with it! This same approach is used with the tribe summaries.
Generating MultiPolygons for the Summaries
For both the university and tribe summaries, we want to generate GeoJSON files using MultiPolygons as the core geometry type. For the university summary, each MultiPolygon represents the collection of parcel Polygons corresponding to each university. Likewise, for the tribe summaries, each MultiPolygon represents the collection of parcel Polygons corresponding to (1) present day tribe for tribe-summary-condensed.geojson and (2) common values for 'gis_acres', 'present_day_tribe', 'rights_type', 'university', 'state', and 'cession_number' for tribe-summary.geojson.
To generate these MultiPolygons, we use dissolve—the spatial equivalent of groupby—on the WGS84 version of the dataset generated as the output of Stage 3. We dissolve by the university column for the university summary, the present_day_tribe column for the condensed tribe summary, and the list of fields above for the full tribe summary.
Preserving the In-Place Aggregations
Step 4 already had significant aggregations and reshaping in place by the time I started this PR. At first, I considered trying to replicate these transformations in the GeoDataFrame alongside the original pandas DataFrame. This quickly got hairy, so I went with an alternate approach.
Keep transformations in place throughout, operating on a freshly created pandas DataFrame. In essence, we're loading the same data from Step 3's output, but using the GeoJSON in lieu of the CSV. Then, we hand off the GeoDataFrame—with the geometry column dropped—to the existing aggregation functions.
Perform the dissolve on the GeoDataFrame, using the exact same set of columns we use for the groupby. This gives us access to the MultiPolygon geometries. We only keep the geometry column from this transformation and fields necessary for the merge (see below)—all other columns are dropped.
Merge the pandas DataFrame from the various transformations with the GeoDataFrame. This is essentially an attribute join in GIS land.
With this approach, we have minimal changes to the existing aggregations. In essence, we're just using dissolve to grab the MultiPolygon geometries and joining them to the existing summaries. Let me know how we feel about this approach! I think the upside is that it keeps the blast radius of this change small.
I'll call out the few places in the PR where I was receiving errors running on main and needed to introduce fixes.
This PR is a follow on to #40 to add support for GeoJSON exports of the university summary. I also want to chat a bit about the approach used here to ensure we're all ok with it! This same approach is used with the tribe summaries.
Generating MultiPolygons for the Summaries
For both the university and tribe summaries, we want to generate GeoJSON files using
MultiPolygon
s as the core geometry type. For the university summary, eachMultiPolygon
represents the collection of parcelPolygon
s corresponding to each university. Likewise, for the tribe summaries, eachMultiPolygon
represents the collection of parcelPolygon
s corresponding to (1) present day tribe fortribe-summary-condensed.geojson
and (2) common values for'gis_acres'
,'present_day_tribe'
,'rights_type'
,'university'
,'state'
, and'cession_number'
fortribe-summary.geojson
.To generate these
MultiPolygon
s, we usedissolve
—the spatial equivalent ofgroupby
—on the WGS84 version of the dataset generated as the output of Stage 3. Wedissolve
by theuniversity
column for the university summary, thepresent_day_tribe
column for the condensed tribe summary, and the list of fields above for the full tribe summary.Preserving the In-Place Aggregations
Step 4 already had significant aggregations and reshaping in place by the time I started this PR. At first, I considered trying to replicate these transformations in the
GeoDataFrame
alongside the original pandasDataFrame
. This quickly got hairy, so I went with an alternate approach.DataFrame
. In essence, we're loading the same data from Step 3's output, but using the GeoJSON in lieu of the CSV. Then, we hand off theGeoDataFrame
—with thegeometry
column dropped—to the existing aggregation functions.dissolve
on theGeoDataFrame
, using the exact same set of columns we use for thegroupby
. This gives us access to theMultiPolygon
geometries. We only keep thegeometry
column from this transformation and fields necessary for the merge (see below)—all other columns are dropped.DataFrame
from the various transformations with theGeoDataFrame
. This is essentially an attribute join in GIS land.With this approach, we have minimal changes to the existing aggregations. In essence, we're just using
dissolve
to grab theMultiPolygon
geometries and joining them to the existing summaries. Let me know how we feel about this approach! I think the upside is that it keeps the blast radius of this change small.I'll call out the few places in the PR where I was receiving errors running on
main
and needed to introduce fixes.