geocompx / geocompr

Geocomputation with R: an open source book
https://r.geocompx.org/
Other
1.58k stars 585 forks source link

Which points inside chosen small polygon #448

Closed iDemirsoy closed 5 years ago

iDemirsoy commented 5 years ago

Hi Dr Lovelace,

I have a SpatialPolygonsDataFrame, SpatialPointsDataFrame and SpatialLinesDataFrame for a county in the USA. I uploaded 4 plots, one for each of them and one for plot() lines() points() as we can see CRS these are from same area. I have 2 questions.

1- Is there a way to combine them ? maybe not 3 of them. I would be okay to combine combine pointsdataframe with any of them to work on.

2- as we can see from allT plot, we can plot them, is there a way to count and see which points inside any randomly chosen small polygon?

Thank you

allT Line points poly

class : SpatialPolygonsDataFrame features : 68 extent : -84.71536, -83.97717, 30.27344, 30.68533 (xmin, xmax, ymin, ymax) crs : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 variables : 10 names : OBJECTID, COUNTYFP, TRACTCE, GEOID, NAME, ALAND, INTPTLAT, INTPTLON, Shape_Leng, Shape_Area min values : 1, NA, 000200, 12073000200, 10.01, 989846, +30.3256584, -084.0783700, 0.0396819168344, 9.29654822789e-05 max values : 9, NA, 002702, 12073002702, 9.05, 367819241, +30.6436073, -084.5489241, 1.08844513024, 0.0354811077168

class : SpatialPointsDataFrame features : 24020 extent : -84.7061, -83.98723, 30.27224, 30.67932 (xmin, xmax, ymin, ymax) crs : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 variables : 130 names : KEYFIELD1, CALYEAR, CRASHNUM, CASENUMBER, AGENCY, DISTRICT, DOTCOUNTY, CRASHDATE, CRASHTIME, DAYOFWEEK, DHSCNTYCTY, TOWNNAME, INTOWNFLAG, ONROADNAME, INROADNAME, ... min values : 2008030464420, 2008, 17741700, 0000007632, F, 3, 55, 2008/01/01, 0, 1, 300, 1330, N, 100 BLK COLLEGE AVE, |||||||||||||||||||||||||, ... max values : 2012917695700, 2012, 917695700, TRCC12OFF001401, S, 3, 55, 2012/12/31, 8888, 7, 6700, WOODVILLE, Y, ZONKER CT, ZONKERS CT, ...

class : SpatialLinesDataFrame features : 7162 extent : -84.70612, -83.98041, 30.27399, 30.68428 (xmin, xmax, ymin, ymax) crs : +proj=longlat +datum=WGS84 +nodefs +ellps=WGS84 +towgs84=0,0,0 variables : 52 names : OBJECTID, FNODE, TNODE, LPOLY, RPOLY, LENGTH, LCSTSEG, LCSTSEG_ID, SOURCE, R_F_ADD, R_T_ADD, L_F_ADD, L_T_ADD, PREFIX, NAME, ... min values : 100, 0, 0, 0, 0, 4.42966247, 0, 1, 2012MERRICK, 0, 0, 0, 0, E, 4TH, ... max values : 9999, 0, 9992, 0, 0, 30399.81457977, 9983, 9978, TSP110037, 9981, 9999, 9980, 9999, WB, ZILLAH, ...

Robinlovelace commented 5 years ago

Hi @iDemirsoy good question. The book uses the newer sf package but you can go from your data to sf objects with the function st::st_as_sf(). As a starting point I've created a reproducible example (reprex) that I think represents your data:

# setup -------------------------------------------------------------------

install.packages("remotes")
#> Installing package into '/home/robin/R/x86_64-pc-linux-gnu-library/3.6'
#> (as 'lib' is unspecified)
remotes::install_cran("pct") # for zones data
#> Skipping install of 'pct' from a cran remote, the SHA1 (0.2.7) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_cran("stats19") # for crash data
#> Skipping install of 'stats19' from a cran remote, the SHA1 (1.0.0) has not changed since last install.
#>   Use `force = TRUE` to force installation
remotes::install_github("itsleeds/geofabric") # for roads data
#> Skipping install of 'geofabric' from a github remote, the SHA1 (7acec3a1) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(geofabric)
library(stats19)
#> Data provided under OGL v3.0. Cite the source and link to:
#> www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

# get test data -----------------------------------------------------------

osm_data = get_geofabric("isle of wight")
#> No exact matching geofabric zone. Best match is Isle of Wight (7.2 MB)
#> Downloading http://download.geofabrik.de/europe/great-britain/england/isle-of-wight-latest.osm.pbf to 
#> /tmp/RtmpX5DqZl/isle of wight.osm.pbf
#> Old attributes: attributes=name,highway,waterway,aerialway,barrier,man_made
#> New attributes: attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway
#> Using ini file that can can be edited with file.edit(/tmp/RtmpX5DqZl/ini_new.ini)
roads_sf = osm_data[!is.na(osm_data$highway) & osm_data$highway == "primary", ]
roads = as(roads_sf, "Spatial")

zones_sf = pct::get_pct_zones("isle-of-wight")
zones = as(zones_sf, "Spatial")

crashes_df = get_stats19(year = 2017, type = "ac")
#> Files identified: dftRoadSafetyData_Accidents_2017.zip
#>    http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/dftRoadSafetyData_Accidents_2017.zip
#> Attempt downloading from:
#> Data saved at /tmp/RtmpX5DqZl/dftRoadSafetyData_Accidents_2017/Acc.csv
#> Reading in:
#> /tmp/RtmpX5DqZl/dftRoadSafetyData_Accidents_2017/Acc.csv
crashes_sf = format_sf(crashes_df, lonlat = TRUE)
#> 29 rows removed with no coordinates
crashes_in_zone = crashes_sf[zones_sf, ]
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
crashes = as(crashes_in_zone, "Spatial")

cols = colorspace::sequential_hcl(n = 4)
zones$cols = cut(zones$all, quantile(zones$all), labels = cols)
sp::plot(zones, col = cols)
sp::plot(roads, add = TRUE, lwd = 3)
sp::plot(crashes, col = "red", add = TRUE)

Created on 2019-09-30 by the reprex package (v0.3.0)

Robinlovelace commented 5 years ago

roads-test-data.Rda.zip

Challenge: starting with the code below convert the objects into sf objects, take a read of Chapters 2, 3 and 4, and see if you can work out the answer, this section should be especially useful: https://geocompr.robinlovelace.net/spatial-operations.html#spatial-aggr

This code should get you started:

download.file("https://github.com/Robinlovelace/geocompr/files/3669086/roads-test-data.Rda.zip", "test-data.zip")
    unzip("test-data.zip")
    load("roads-test-data.Rda")

    # convert to sf
    library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
    roads = st_as_sf(roads)
#> Loading required package: sp
    zones = st_as_sf(zones)
    crashes = st_as_sf(crashes)

    plot(zones$geometry)
    plot(roads, add = TRUE, lwd = 3)
#> Warning in plot.sf(roads, add = TRUE, lwd = 3): ignoring all but the first
#> attribute
    plot(crashes, add = TRUE, col = "red")
#> Warning in plot.sf(crashes, add = TRUE, col = "red"): ignoring all but the
#> first attribute

Created on 2019-09-30 by the reprex package (v0.3.0)

Heads-up @agila5 who may also have some tips.

agila5 commented 5 years ago

Hi! I'm not really sure I understand your questions, @iDemirsoy can you share some data and code?

Thanks.

iDemirsoy commented 5 years ago

Hi Dr @Robinlovelace,

I actually figured after I looked at your book. The page is fascinating. I figured which points in which tract. tract7 = leonTract1 %>% filter(tract == "7") L7 = linesL[tract7, ] pt7= crshs[tract7,]

my new problem, when I tried to find which accident happened in which road, It doesn't work. I looked at points of accidents on ArcGIS and Points are so close to roads(lines), they look over them but lines lon and lat doesn't subset with points. So I cannot find how many accidents that specific road. How would you approach to solve this? I am thinking of moving points to closes road but I don't know how I can do that.

@agila5 I cannot share the data more than what I already shared but I wouldn't mind trying to make it clear if you can reach me out.

Thank you all for your time

head(L7,1) Simple feature collection with 1 feature and 40 fields geometry type: LINESTRING dimension: XY bbox: xmin: -84.28314 ymin: 30.45364 xmax: -84.28308 ymax: 30.45364 epsg (SRID): 4269 proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +nodefs OBJECTID FNODE TNODE LPOLY RPOLY LENGTH LCSTSEG LCSTSEG_ID SOURCE R_F_ADD R_T_ADD L_F_ADD L_T_ADD PREFIX NAME 583 584 0 0 0 0 18.78769 0 4376 ADR-1996 190 198 191 199 W 4TH TYPE SUFFIX MILES FULLNAME CLASS FUNC_CLASS TRAVEL_FLA OWNER OLD_NAME EDIT_DATE QC_CODE EDIT_BY 583 AVE 0.00355824 W 4Th Ave 70 2 2 CITY 2014-02-24T00:00:00.000Z COXJ BUILT_STAT ONEWAY GEONAME_L GEONAME_R COUNTY SPEED FT_COST TF_COST F_ELEV T_ELEV STATUS Shape_Leng THEOR_RANG 583 BUILT TALLAHASSEE TALLAHASSEE LEON 25 0.00853978 0.00853978 0 0 O 6.637509 Y geometry 583 LINESTRING (-84.28308 30.45...

head(pt7,1) Simple feature collection with 1 feature and 115 fields geometry type: POINT dimension: XY bbox: xmin: -84.27804 ymin: 30.45134 xmax: -84.27804 ymax: 30.45134 epsg (SRID): 4269 proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs KEYFIELD1 CALYEAR CRASHDATE CRASHTIME DAYOFWEEK DHSCNTYCTY TOWNNAME INTOWNFLAG ONROADNAME INROADNAME REFDIST 95 2.008719e+12 2008 2008/01/04 1531 5 1350 TALLAHASSEE Y GADSDEN ST N JOHNSTON ST 0 REFMEAS REFDIRECT ROADWAYID LOCMILEPT LOCNODE SRROUTEID USROUTEID SIDEOFROAD CRASHLANE ROADCATGRY SKIDNUMBER SKIDTESTDT 95 FT 55000000 0 NA U U NA 0 2001/01/01 RCIFEDHWY RCIFUNCLAS RCIACC RCIPLACECD RCILANDUSE RCISURFWTH RCISLDTYP1 RCISLDTYP2 RCISLDTYP3 RCISLDWTH1 RCISLDWTH2 95 NA NA NA NA NA 0 NA NA NA 0 0 RCISLDWTH3 RCIMEDWDTH RCIAADT RCIAVGTFCT RCIHZDGCRV RCIMAXSPD RCITYPPARK HIGHESTINJ CRSHALCDRG SITELOCA LIGHTCOND WEATHCOND 95 0 0 0 0 30 NA 3 0 2 1 1 RDSURFCOND DIV_UNDIV TRAFCNTL1 TRAFCNTL2 CNTOFLANES RDSURFTYPE ROADCOND1 ROADCOND2 VISIBLTY1 VISIBLTY2 CRSHEVENT1 95 1 2 1 0 0 2 1 0 1 0 6 CRSHCAUSE1 CNTOFINJ CNTOFFATL CNTOFSVINJ CNTOFPEDES CNTOFDRVRS CNTOFCYCLS CNTOFVEH CNTOFPERS CNTNONTFTL TOTCRSHDMG 95 5 1 0 0 0 2 0 2 3 0 2000 TOTVEHDMG TOTOTHRDMG WORKZONE FLAG_CMV FLAG_INT FL_LANEDEP FL_AGGRSV FL_VRU_PED FL_VRU_BIK FLELDERPED FLPED5064 FLPED6569 95 2000 0 N N Y Y N N N N N N FLPED7074 FLPED7579 FLPED80UP FLMINORPED FLELDERBIK FLBIK5064 FLBIK6569 FLBIK7074 FLBIK7579 FLBIK80UP FLMINORBIK FL_VRU_MOT 95 N N N N N N N N N N N N FLMOT5064 FLMOT6569 FLMOT7074 FLMOT7579 FLMOT80UP FL_AR_TEEN FL_AR_AG FL_AG5064 FL_AG6569 FL_AG7074 FL_AG7579 FL_AG80UP 95 N N N N N N N Y N N N N FLAG_IMP FL_IMP_PED FLAG_DIST FLAG_MIN FLNOBTKID FLNOBTMIN FLNOBTTEEN FL_NOBELT XCOORDINAT YCOORDINAT 95 N N N N N N N N 185197.2 3373367 geometry
95 POINT (-84.27804 30.45134)

Screen Shot 2019-10-01 at 12 01 50 AM
Robinlovelace commented 5 years ago

Hi @iDemirsoy glad you found the book useful. I think creating buffers around the roads will help. The example below creates a 200 m buffer around the road and then aggregates the crashes per road segment. Watch out for double counting. Let us know if this works, hope so!

download.file("https://github.com/Robinlovelace/geocompr/files/3669086/roads-test-data.Rda.zip", "test-data.zip")
unzip("test-data.zip")
load("roads-test-data.Rda")

# convert to sf
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
roads = st_as_sf(roads)
#> Loading required package: sp
zones = st_as_sf(zones)
crashes = st_as_sf(crashes)

# install.packages("stplanr") # for stplanr package
library(stplanr)
#> Registered S3 method overwritten by 'R.oo':
#>   method        from       
#>   throw.default R.methodsS3
#> 
#> Attaching package: 'stplanr'
#> The following object is masked _by_ '.GlobalEnv':
#> 
#>     zones
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
roads_buffer = stplanr::geo_projected(roads, sf::st_buffer, dist = 200, endCapStyle = "FLAT")
roads_aggregated = aggregate(crashes["accident_severity"], roads_buffer, FUN = length)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
plot(roads_aggregated)


# get road number for each point
crashes_buffer_joined = st_join(crashes, roads_buffer)
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar
sum(!is.na(crashes_buffer_joined$accident_index))
#> [1] 374
# select only one crash per ro
crashes_aggregated = crashes_buffer_joined %>% 
  group_by(accident_index) %>% 
  slice(1) %>% 
  ungroup() %>% 
  group_by(osm_id) %>% 
  summarise(n = n())

crashes_aggregated
#> Simple feature collection with 119 features and 2 fields
#> geometry type:  GEOMETRY
#> dimension:      XY
#> bbox:           xmin: -1.537697 ymin: 50.59047 xmax: -1.090763 ymax: 50.75885
#> epsg (SRID):    4326
#> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#> # A tibble: 119 x 3
#>    osm_id        n                                                 geometry
#>  * <chr>     <int>                                           <GEOMETRY [°]>
#>  1 1024219       1                               POINT (-1.289251 50.69979)
#>  2 1028123       3 MULTIPOINT (-1.168595 50.64761, -1.168399 50.64751, -1.…
#>  3 1029449       2      MULTIPOINT (-1.197231 50.59899, -1.196444 50.59957)
#>  4 1104558       1                               POINT (-1.302355 50.70885)
#>  5 110695638     1                                  POINT (-1.2296 50.6606)
#>  6 111590294     1                               POINT (-1.285766 50.69614)
#>  7 114516120     1                               POINT (-1.165097 50.65029)
#>  8 1207030       1                               POINT (-1.161055 50.65168)
#>  9 1307866       1                                POINT (-1.145843 50.6746)
#> 10 1393180       2        MULTIPOINT (-1.169001 50.65105, -1.16765 50.6499)
#> # … with 109 more rows
plot(crashes_aggregated)

roads_aggregated = left_join(roads["osm_id"], st_drop_geometry(crashes_aggregated))
#> Joining, by = "osm_id"
plot(roads_aggregated)

Created on 2019-10-01 by the reprex package (v0.3.0)