Code and methodology to produce the dataset in Grist and High Country News' investigation into state trust lands on reservations
Creative Commons Zero v1.0 Universal
0
stars
0
forks
source link
feat: Integrate code to clip parcels to reservation boundaries, filter columns by acreage and additional criteria in METHODOLOGY.md, and concatenate activity_info and activity_info_2 columns. #11
This PR integrates all manual data cleaning steps from METHODOLOGY.md post-activity match. These include:
Clipping the STLs layer to the boundaries of the BIA-AIAN reservations layer (with Tribal Statistical Areas included) as well as the BIA-AIAN supplemental layer added in #4.
In addition, we compute the clipped_acres column as described in the methodology.
Filtering parcels to those with clipped_acres >= 10.0 and additional criteria mentioned in the methodology.
Specifically, I translated the following to code:
Second, we took out any instances of improper overlap. For example, several parcels in Wyoming overlapped with the Crow reservation in Montana, which aligns right up against the border of Wyoming. We took these parcels out, since the Crow reservation is located solely within Montana.
⚠️ Should we consult Maria to see if there are additional cases like the above?
Joining the activity_info and activity_info_2 columns into a single activity_info column.
This step wasn't documented, but I verified it manually in Jupyter by comparing the dataframes from 05_AcreageGreaterThan10.geojson and 06_All-STLs-on-Reservations-Final.geojson. Specifically, I concatenated the activity_info and activity_info_2 columns in 05_AcreageGreaterThan10.geojson using the concatenate_activity_info function in this PR. Then, I subsetted both dataframes to just the activity_info column, trimmed activity_info values to remove erroneous whitespace present in 06_All-STLs-on-Reservations-Final.geojson, and used pandascompare to compare the two dataframes. Fortunately, they were identical!
I subsetted the final dataframe to the set of columns present in 06_All-STLs-on-Reservations-Final.geojson.
To avoid bandwidth charges, I avoided committing any of the generated files here. However, you can obtain them by running: python stlor/main.py locally!
This PR integrates all manual data cleaning steps from METHODOLOGY.md post-activity match. These include:
clipped_acres
column as described in the methodology.Filtering parcels to those with
clipped_acres >= 10.0
and additional criteria mentioned in the methodology.⚠️ Should we consult Maria to see if there are additional cases like the above?
activity_info
andactivity_info_2
columns into a singleactivity_info
column.05_AcreageGreaterThan10.geojson
and06_All-STLs-on-Reservations-Final.geojson
. Specifically, I concatenated theactivity_info
andactivity_info_2
columns in05_AcreageGreaterThan10.geojson
using theconcatenate_activity_info
function in this PR. Then, I subsetted both dataframes to just theactivity_info
column, trimmedactivity_info
values to remove erroneous whitespace present in06_All-STLs-on-Reservations-Final.geojson
, and usedpandas
compare
to compare the two dataframes. Fortunately, they were identical!06_All-STLs-on-Reservations-Final.geojson
.To avoid bandwidth charges, I avoided committing any of the generated files here. However, you can obtain them by running:
python stlor/main.py
locally!