Grist-Data-Desk / land-grab-2

Code and methodology to produce the dataset in Grist's Misplaced Trust investigation
https://grist.org/project/indigenous/land-grant-universities-indigenous-lands-fossil-fuels
Creative Commons Zero v1.0 Universal
14 stars 2 forks source link

Fix #35. Ensure parcel_count attribute is persisted to CSV output. #36

Closed parkerziegler closed 10 months ago

parkerziegler commented 10 months ago

This PR fixes the small issue outlined in the first comment of #35. Importantly, it does not remove the SKIP_DEDUP flag for MN (though that should be trivial once this is merged).

The Problem

This line was attempting to write an attribute "PARCEL_COUNT" (note the all caps) to each GeoDataFrame. Tracing program execution from this line, the dataframe (after some transformation) is handed off to _merge_dataframes, which in turn calls combine_dfs for cases where we need to merge more than one dataframe.

Inside of combine_dfs, we do some set computation to build up the set of attributes shared by all dataframes while also ensuring columns present in the constant list FINAL_DATAFRAME_COLUMNS are also present. Later, on this line, we subsequently drop all columns not in FINAL_DATAFRAME_COLUMNS. The attribute "PARCEL_COUNT" is not in that list; instead, the string literal "parcel_count" (note the lowercase) is. This literal is represented by the constant PARCEL_COUNT.

The Solution

Pretty small—just change the string literal "PARCEL_COUNT" to the constant binding PARCEL_COUNT, which represents the string literal "parcel_count".

Along the way, I did notice an opportunity to remove a tiny bit of dead code from overlap.py. We already return early in combine_dfs when we have a df_list of length 1, so there's no need for the conditional check on df_list length afterwards.

clayton-aldern commented 10 months ago

Bless you @parkerziegler; I was ready to jump off a bridge.

parkerziegler commented 10 months ago

It took a good while to see it, definitely had me questioning reality for a bit 😂