ateucher / lutz

Look up timezones of point coordinates
http://andyteucher.ca/lutz/
Other
58 stars 5 forks source link

Bye, bye {V8}; Hello, {Rcpp}!! #4

Closed hrbrmstr closed 5 years ago

hrbrmstr commented 5 years ago

I think I dotted all the i's and crossed all the t's.

hrbrmstr commented 5 years ago

now we'll see how Travis like this mod

hrbrmstr commented 5 years ago

Lemme port https://github.com/darkskyapp/tz-lookup/blob/master/test.js over to here and run the full checks. (later this week)

hrbrmstr commented 5 years ago

OK, so the mocha tests all pass for the darksky JS lib but some of them fail (i did a way too quick port of them, so it cld be my js array data wrangling) for the Rcpp port so i def have to take more time on this but will also run them against the accurate functions as well. shld be a robust test suite!

ateucher commented 5 years ago

@hrbrmstr this is absolutely brilliant, thank you so much! I definitely will take it. Thinking I'll take this now and when/if you get the time for another PR with the mocha tests that would be great. Do you have a start somewhere I can look at?

hrbrmstr commented 5 years ago

I am liking that the fam sleeps late on holiday :-)

Running:

$ mocha test.js

in the tz-lookup repo runs 1,028 tests designed to catch what look like edge cases.

I made most of them into a CSV file: tz-cases.csv.gz and ran both 'fast' and 'accurate' tests. 'fast' has 193 test errors and 'accurate' has 126:

library(lutz)
library(tidyverse)

Read in the cases

cases <- read_csv("~/Data/tz-cases.csv.gz", col_types="ddc")

Do tests with ‘fast’

fast <- tz_lookup_coords(cases$lat, cases$lng, "fast", FALSE)
fast_err <- which(!(cases$val == fast))

Do tests with ‘accurate’

accurate <- tz_lookup_coords(cases$lat, cases$lng, "accurate", FALSE)
accurate_err <- which(!(cases$val == accurate))

Gather all the errors

bind_rows(
  cases[fast_err,] %>%
    mutate(case_num = fast_err, method = "fast") %>%
    select(case_num, method, everything()),
  cases[accurate_err,] %>%
    mutate(case_num = accurate_err, method = "accurate") %>%
    select(case_num, method, everything())
) -> tz_errs

See how many for each test

count(tz_errs, method)
## # A tibble: 2 x 2
##   method       n
##   <chr>    <int>
## 1 accurate   126
## 2 fast       193

See if there are common errors

count(tz_errs, case_num, sort = TRUE)
## # A tibble: 315 x 2
##    case_num     n
##       <int> <int>
##  1      573     2
##  2      596     2
##  3      846     2
##  4      940     2
##  5        4     1
##  6        7     1
##  7        9     1
##  8       12     1
##  9       14     1
## 10       15     1
## # … with 305 more rows

I'll poke at the C++ code a bit to see what I'm not translating from javascript properly.

hrbrmstr commented 5 years ago

One reason 'accurate' results differ is some return GMT offsets vs what the test functions expect.

However, some are errors like: Antarctica/Davis which shld be +7 unless I'm reading the local timezone db wrong.

# A tibble: 126 x 6
   case_num method     lat     lng val                err_val   
      <int> <chr>    <dbl>   <dbl> <chr>              <chr>     
 1       77 accurate  90    -90    Etc/GMT            Etc/GMT+6 
 2       79 accurate  90     90    Etc/GMT            Etc/GMT-6 
 3      525 accurate -79.0   81.4  Antarctica/Davis   Etc/GMT-5 
 4      526 accurate -73.1  107.   Antarctica/Vostok  Etc/GMT-7 
 5      528 accurate -73.4    5.44 Antarctica/Troll   Etc/GMT   
 6      529 accurate -66.5   64.1  Antarctica/Mawson  Etc/GMT-4 
 7      534 accurate -81.8   20.3  Antarctica/Troll   Etc/GMT-1 
 8      535 accurate -89.7  -12.8  Antarctica/McMurdo Etc/GMT+1 
 9      539 accurate -66.3  -77.8  Antarctica/Rothera Etc/GMT+5 
10      549 accurate -80.9 -167.   Antarctica/McMurdo Etc/GMT+11
# … with 116 more rows

This does bring up another question: do both methods need to return the exact same string for the exact same input vals? GMT offsets are fine timezone specifiers IMO but it'll make testing and comparing a tad more grungy. We may want to note the diff vals in docs somewhere.

hrbrmstr commented 5 years ago

Looks like (re: 'fast' errs) might have something to do with the double math ops. I'll take a look at this again on the morrow.

hrbrmstr commented 5 years ago

Yep. truncated a double prematurely.

✔ | 1020       | tz-lookup js edge cases [0.7 s]

Will update the PR with these then we can discuss the other bits abt tz strings being returned by each method.

hrbrmstr commented 5 years ago

I rly cld not stand the line lengths for the generated data so i added a call to fold (linux/macOS) and also a {stringi} wrapper to make the generated file much less icky.

I need to add in a test for invalid input or a codecov noop comment to sget the codecov % back to passing.

hrbrmstr commented 5 years ago

7408a20 shld say "fold" vs "find". I felt icky having something that cld not run on all platforms so rewrote fold in inline Rcpp in the script that generates the header.

ateucher commented 5 years ago

@hrbrmstr thanks again and sorry for the radio silence. I should be able to look at this more in depth next week, but I left a couple of inline comments - certainly not expecting you to do the work, but wanted to capture some thoughts, and at least contribute a little to your awesome work! I love the script to auto-generate src/generated-vars.h, especially the fold function 😄 .

Regarding perfect alignment between the "fast" and "accurate" methods, I'm also not too fussed about some being named, and some being GMT offsets... I think a lot will be resolved if we get them both using the latest version of the timezone boundaries, as I mentioned in my comment. I have started using the latest boundary file for the "accurate" method in the update-2019a branch, but ran into an issue with overlapping timezones which I haven't yet decided how to deal with (I don't think these are an issue (yet) in the darksky boundaries - see https://github.com/darkskyapp/tz-lookup/issues/34).