dgrtwo / fuzzyjoin

Join tables together on inexact matching
Other
668 stars 61 forks source link

geo_join is returning data that doesn't look right #8

Open kanaugust opened 8 years ago

kanaugust commented 8 years ago

I have followed the exact steps from the reference doc at geo_join section.

data("state")
state.name
state.center
states <- data_frame(state = state.name,
                     longitude = state.center$x,
                     latitude = state.center$y)

s1 <- rename(states, state1 = state)
s2 <- rename(states, state2 = state)

pairs <- s1 %>%
  geo_inner_join(s2, max_dist = 200) %>%
  filter(state1 != state2)

library(ggplot2)
ggplot(pairs, aes(x = longitude.x, y = latitude.x,
                  xend = longitude.y, yend = latitude.y)) +
  geom_segment(color = "red") +
  borders("state") +
  theme_void()

But the result seems to be odd. Instead of joining all the states that are close within 200 miles distance, the states in the result seems to be pretty far from each other.

Source: local data frame [74 x 6]

        state1 longitude.x latitude.x        state2 longitude.y latitude.y
         (chr)       (dbl)      (dbl)         (chr)       (dbl)      (dbl)
1       Alaska   -127.2500    49.2500     Louisiana    -92.2724    30.6181
2       Alaska   -127.2500    49.2500       Montana   -109.3200    46.8230
3      Arizona   -111.6250    34.2192   Connecticut    -72.3573    41.5928
4      Arizona   -111.6250    34.2192       Indiana    -86.0808    40.0495
5      Arizona   -111.6250    34.2192     Minnesota    -94.6043    46.3943
6     Colorado   -105.5130    38.6777  North Dakota   -100.0990    47.2517
7  Connecticut    -72.3573    41.5928       Arizona   -111.6250    34.2192
8  Connecticut    -72.3573    41.5928       Georgia    -83.3736    32.3329
9  Connecticut    -72.3573    41.5928     Minnesota    -94.6043    46.3943
10 Connecticut    -72.3573    41.5928 New Hampshire    -71.3924    43.3934
..         ...         ...        ...           ...         ...        ...

I could be my understanding of this function is wrong or the way I'm using is something wrong. But because I'm also getting some strange result with 'regex_join' function, I just wanted to check if there might be some known issue with the latest one from CRAN and Github.

kanaugust commented 8 years ago

Here is my session info.

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] jsonlite_0.9.19        httr_1.1.0             maps_3.1.0             ggplot2_2.1.0         
 [5] stringr_1.0.0          fuzzyjoin_0.1.9000     devtools_1.10.0        readr_0.2.2           
 [9] qdapDictionaries_1.0.6 dplyr_0.4.3           

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.4        magrittr_1.5       munsell_0.4.3      colorspace_1.2-6   lattice_0.20-33   
 [6] geosphere_1.5-1    R6_2.1.2           plyr_1.8.3         tools_3.2.4        parallel_3.2.4    
[11] grid_3.2.4         gtable_0.2.0       DBI_0.3.1          git2r_0.14.0       withr_1.0.1       
[16] openssl_0.9.2      lazyeval_0.1.10    assertthat_0.1     digest_0.6.9       purrr_0.2.1       
[21] tidyr_0.4.1        curl_0.9.6         stringdist_0.9.4.1 memoise_1.0.0      labeling_0.3      
[26] sp_1.2-2           stringi_1.0-1      scales_0.4.0       httpuv_1.3.3