cenuno / pointdexter

Label longitudinal and latitudinal coordinates located inside a polygon.
https://cenuno.github.io/pointdexter/
4 stars 0 forks source link

`LabelPointsWithinPolygons()` fails when no points exist within a polygon #2

Closed cenuno closed 5 years ago

cenuno commented 5 years ago

Overview

This issue is courtesy of my colleague Noah who was getting an error message when using pointdexter with 2010 census tract boundaries for the City of Chicago.

Issue

LabelPointsWithinPolygons() fails when no points exist within a polygon. By using smaller geographies (i.e. census tracts) rather than larger ones (i.e. community areas), pointdexter's current method of labeling does not account for instances where a polygon has no points laying inside of it.

Here is the error message:

Error in data.frame(index = splancs::inpip(pts = df, poly = i, bound = NULL), : arguments imply differing number of rows: 0, 1

Reproducible Example

#
# Author:   Cristian E. Nuno
# Purpose:  Error with LabelPointsWithinPolygons()
# Date:     March 14, 2019
#

# load necessary packages ----
library(pointdexter)
library(sf)

# load necessary data ----

# import built-in chicago school location data
data("cps_sy1819")

# import 2010 chicago census tracts
# note: this comes from the City of Chicago Data Portal
#       https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Census-Tracts-2010/5jrd-6zik
census_tracts_sf <-
  read_sf("https://data.cityofchicago.org/api/geospatial/5jrd-6zik?method=export&format=GeoJSON")

# get boundaries for each census tract ----
census_tract_boundaries <-
  GetPolygonBoundaries(census_tracts_sf, census_tracts_sf$tractce10)

# label each school with the census tract it lies in ----
cps_sy1819$census_tract <-
  LabelPointsWithinPolygons(lng = cps_sy1819$school_longitude
                            , lat = cps_sy1819$school_latitude
                            , polygon.boundaries = census_tract_boundaries)

# Error in data.frame(index = splancs::inpip(pts = df, poly = i, bound = NULL),  : 
# arguments imply differing number of rows: 0, 1

# end of script #

Session Info

sessioninfo::session_info()

─ Session info ──────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       macOS High Sierra 10.13.6   
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2019-03-14                  

─ Packages ──────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.0)
 class         7.3-14  2015-08-30 [1] CRAN (R 3.5.2)
 classInt      0.3-1   2018-12-18 [1] CRAN (R 3.5.0)
 cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.0)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)
 DBI           1.0.0   2018-05-02 [1] CRAN (R 3.5.0)
 e1071         1.7-0   2018-07-28 [1] CRAN (R 3.5.0)
 lattice       0.20-38 2018-11-04 [1] CRAN (R 3.5.2)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.0)
 pillar        1.3.1   2018-12-15 [1] CRAN (R 3.5.0)
 pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.0)
 pointdexter * 0.1.0   2019-01-30 [1] CRAN (R 3.5.2) 
 Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.0)
 rlang         0.3.1   2019-01-08 [1] CRAN (R 3.5.2)
 rstudioapi    0.9.0   2019-01-09 [1] CRAN (R 3.5.2)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.0)
 sf          * 0.7-2   2018-12-20 [1] CRAN (R 3.5.0)
 sp            1.3-1   2018-06-05 [1] CRAN (R 3.5.0)
 splancs       2.01-40 2017-04-16 [1] CRAN (R 3.5.0)
 tibble        2.0.1   2019-01-12 [1] CRAN (R 3.5.2)
 units         0.6-2   2018-12-05 [1] CRAN (R 3.5.0)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.0)
 yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.0)

[1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
cenuno commented 5 years ago

Overview

I revised LabelPointsWithinPolygons so that it stores the results of splancs::inpip() for each polygon as a named integer vector in a list rather than in a data frame. To fix this issue, please install the latest version of pointdexter (0.1.1) from CRAN and re-run your script:

install.packages("pointdexter")

Example of stack dropping integer(0) records from data frame

Moving from data frame to a list for storing the output of splancs::inpip() allowed for edge cases where a polygon does not have a single point laying inside of it (returning a value of integer(0)) to be tracked and subsequently dropped.

Here's a brief example:

stack(list("valid" = 1:3, "invalid" = integer(0)))

# note: notice how the integer(0) result from 'invalid' is not returned in the data frame
#  values     ind
# 1     1   valid
# 2     2   valid
# 3     3   valid

By stacking the list of named integer vectors into a data frame, these edge cases are dropped from the data frame prior to LabelPointsWithinPolygons() finishing its execution.

End Result

After updating pointdexter from CRAN to version 0.1.1, the code ran without an error message. Here's a look into the first few records:

head(cps_sy1819[, c("school_id", "school_longitude", "school_latitude", "census_tract")])
school_id school_longitude school_latitude census_tract
609760 -87.59062 41.65629 540102
609780 -87.72174 41.91604 222900
610304 -87.68696 41.87912 280800
610513 -87.63276 41.82814 340600
610390 -87.66579 41.98902 030500
609754 -87.61922 41.83055 839600