SymbolixAU / googlePolylines

R package for encoding objects using Google's Polyline Encoding Algorithm
Other
15 stars 6 forks source link

Handling NA input values #34

Closed ChrisMuir closed 5 years ago

ChrisMuir commented 5 years ago

I ran into a few instances of encode() and decode() handling NA inputs in interesting ways. Here are some examples:

decode()

googlePolylines::decode(NA_character_)
#> [[1]]
#>      lat   lon
#> 1 -8e-05 1e-05

Should this return NA_real_, or a data frame containing two NA values? Either could be done within the Rcpp function rcpp_decode_polyline() without much trouble or additional code. I'd be happy to submit a PR for this if you'd like.

encode()

Example 1: NA inputs round-tripped through encode/decode return numeric lat/lon coordinates.

df <- data.frame(lat = c(NA_real_), lon = c(NA_real_))

(res <- googlePolylines::encode(df))
#> [1] ">>"

googlePolylines::decode(res)
#> [[1]]
#>        lat      lon
#> 1 -0.00016 -0.00016

Example 2: Data frame with four observations round-tripped through encode/decode returns a data frame with only three observations.

df <- data.frame(lat = c(38.5, 40.7, NA_real_, 43.252), 
                 lon = c(-120.2, -120.95, NA_real_, -126.453))

(res <- googlePolylines::encode(df))
#> [1] "_p~iF~ps|U_ulLnnqC_\016\x9e\xd7"

googlePolylines::decode(res)
#> [[1]]
#>       lat      lon
#> 1 38.5000 -120.200
#> 2 40.7000 -120.950
#> 3 40.7024 -120.954
SymbolixAU commented 5 years ago

Thanks for the examples!

decode

Should this return NAreal, or a data frame containing two NA values

My instinct tells me a data.frame with two NA values, to keep the structure consistent. Which do you think is the better option?

I'd be happy to submit a PR for this if you'd like.

yes please :)

encode

Yes these cases need to be handled. My first thoughts are to either

  1. Issue an Error if any NAs are found, to force the issue back to the user
  2. Issue a warning if any NAs are found, but exclude them from the encoding

What do you think to these options?

ChrisMuir commented 5 years ago

So full disclaimer, I actually have never used googlePolylines, so I don't have a strong take what makes the most sense on how to handle these cases. I was just poking around this repo after seeing the cool stuff being done with mapdeck on rstats Twitter, and seeing that you were planning on trying to migrate some of the code to Rcpp.

Having said that, for the decode example, I agree that returning a data frame is probably better. Looking at the list output, I could see wanting to rbind all of the results together via do.call, so a data frame with the same column headers seems wise. I'll submit a PR for that fix tomorrow.

Also, I was playing around with the decode Rcpp functions, and modifyed it so that it's not having to create a new Rcpp::DataFrame for every input string......I currently have a ~15x speed up for decode() with char vector input, and all the tests are passing. Still want to work with it some more, but are you interested in a performance/refactor PR as well?

SymbolixAU commented 5 years ago

but are you interested in a performance/refactor PR as well?

absolutely! Always keen to see & learn from others.

SymbolixAU commented 5 years ago

@ChrisMuir did you look at the encode examples while you were writing your PR?

If not I might move them into a separate issue and make them part of the v0.7.3 milestone

ChrisMuir commented 5 years ago

Nope, I addressed the decode() example, and made some minor speed refactors, but did not address the encode() examples.

SymbolixAU commented 5 years ago

Moved the encode() examples to a separate issue - https://github.com/SymbolixAU/googlePolylines/issues/39