Grist-Data-Desk / STLoR

Code and methodology to produce the dataset in Grist and High Country News' investigation into state trust lands on reservations
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

feat: Integrate code to clip parcels to reservation boundaries, filter columns by acreage and additional criteria in METHODOLOGY.md, and concatenate activity_info and activity_info_2 columns. #11

Closed parkerziegler closed 1 month ago

parkerziegler commented 1 month ago

This PR integrates all manual data cleaning steps from METHODOLOGY.md post-activity match. These include:

  1. Clipping the STLs layer to the boundaries of the BIA-AIAN reservations layer (with Tribal Statistical Areas included) as well as the BIA-AIAN supplemental layer added in #4.
    • In addition, we compute the clipped_acres column as described in the methodology.
  2. Filtering parcels to those with clipped_acres >= 10.0 and additional criteria mentioned in the methodology.

    • Specifically, I translated the following to code:

      Second, we took out any instances of improper overlap. For example, several parcels in Wyoming overlapped with the Crow reservation in Montana, which aligns right up against the border of Wyoming. We took these parcels out, since the Crow reservation is located solely within Montana.

    ⚠️ Should we consult Maria to see if there are additional cases like the above?

  3. Joining the activity_info and activity_info_2 columns into a single activity_info column.
    • This step wasn't documented, but I verified it manually in Jupyter by comparing the dataframes from 05_AcreageGreaterThan10.geojson and 06_All-STLs-on-Reservations-Final.geojson. Specifically, I concatenated the activity_info and activity_info_2 columns in 05_AcreageGreaterThan10.geojson using the concatenate_activity_info function in this PR. Then, I subsetted both dataframes to just the activity_info column, trimmed activity_info values to remove erroneous whitespace present in 06_All-STLs-on-Reservations-Final.geojson, and used pandas compare to compare the two dataframes. Fortunately, they were identical!
  4. I subsetted the final dataframe to the set of columns present in 06_All-STLs-on-Reservations-Final.geojson.

To avoid bandwidth charges, I avoided committing any of the generated files here. However, you can obtain them by running: python stlor/main.py locally!