Closed hrbrmstr closed 5 years ago
now we'll see how Travis like this mod
Lemme port https://github.com/darkskyapp/tz-lookup/blob/master/test.js over to here and run the full checks. (later this week)
OK, so the mocha
tests all pass for the darksky JS lib but some of them fail (i did a way too quick port of them, so it cld be my js array data wrangling) for the Rcpp port so i def have to take more time on this but will also run them against the accurate functions as well. shld be a robust test suite!
@hrbrmstr this is absolutely brilliant, thank you so much! I definitely will take it. Thinking I'll take this now and when/if you get the time for another PR with the mocha tests that would be great. Do you have a start somewhere I can look at?
I am liking that the fam sleeps late on holiday :-)
Running:
$ mocha test.js
in the tz-lookup
repo runs 1,028 tests designed to catch what look like edge cases.
I made most of them into a CSV file: tz-cases.csv.gz and ran both 'fast' and 'accurate' tests. 'fast' has 193 test errors and 'accurate' has 126:
library(lutz)
library(tidyverse)
Read in the cases
cases <- read_csv("~/Data/tz-cases.csv.gz", col_types="ddc")
Do tests with ‘fast’
fast <- tz_lookup_coords(cases$lat, cases$lng, "fast", FALSE)
fast_err <- which(!(cases$val == fast))
Do tests with ‘accurate’
accurate <- tz_lookup_coords(cases$lat, cases$lng, "accurate", FALSE)
accurate_err <- which(!(cases$val == accurate))
Gather all the errors
bind_rows(
cases[fast_err,] %>%
mutate(case_num = fast_err, method = "fast") %>%
select(case_num, method, everything()),
cases[accurate_err,] %>%
mutate(case_num = accurate_err, method = "accurate") %>%
select(case_num, method, everything())
) -> tz_errs
See how many for each test
count(tz_errs, method)
## # A tibble: 2 x 2
## method n
## <chr> <int>
## 1 accurate 126
## 2 fast 193
See if there are common errors
count(tz_errs, case_num, sort = TRUE)
## # A tibble: 315 x 2
## case_num n
## <int> <int>
## 1 573 2
## 2 596 2
## 3 846 2
## 4 940 2
## 5 4 1
## 6 7 1
## 7 9 1
## 8 12 1
## 9 14 1
## 10 15 1
## # … with 305 more rows
I'll poke at the C++ code a bit to see what I'm not translating from javascript properly.
One reason 'accurate' results differ is some return GMT offsets vs what the test functions expect.
However, some are errors like: Antarctica/Davis
which shld be +7 unless I'm reading the local timezone db wrong.
# A tibble: 126 x 6
case_num method lat lng val err_val
<int> <chr> <dbl> <dbl> <chr> <chr>
1 77 accurate 90 -90 Etc/GMT Etc/GMT+6
2 79 accurate 90 90 Etc/GMT Etc/GMT-6
3 525 accurate -79.0 81.4 Antarctica/Davis Etc/GMT-5
4 526 accurate -73.1 107. Antarctica/Vostok Etc/GMT-7
5 528 accurate -73.4 5.44 Antarctica/Troll Etc/GMT
6 529 accurate -66.5 64.1 Antarctica/Mawson Etc/GMT-4
7 534 accurate -81.8 20.3 Antarctica/Troll Etc/GMT-1
8 535 accurate -89.7 -12.8 Antarctica/McMurdo Etc/GMT+1
9 539 accurate -66.3 -77.8 Antarctica/Rothera Etc/GMT+5
10 549 accurate -80.9 -167. Antarctica/McMurdo Etc/GMT+11
# … with 116 more rows
This does bring up another question: do both methods need to return the exact same string for the exact same input vals? GMT offsets are fine timezone specifiers IMO but it'll make testing and comparing a tad more grungy. We may want to note the diff vals in docs somewhere.
Looks like (re: 'fast' errs) might have something to do with the double math ops. I'll take a look at this again on the morrow.
Yep. truncated a double prematurely.
✔ | 1020 | tz-lookup js edge cases [0.7 s]
Will update the PR with these then we can discuss the other bits abt tz strings being returned by each method.
I rly cld not stand the line lengths for the generated data so i added a call to fold
(linux/macOS) and also a {stringi} wrapper to make the generated file much less icky.
I need to add in a test for invalid input or a codecov noop comment to sget the codecov % back to passing.
7408a20 shld say "fold" vs "find". I felt icky having something that cld not run on all platforms so rewrote fold
in inline Rcpp in the script that generates the header.
@hrbrmstr thanks again and sorry for the radio silence. I should be able to look at this more in depth next week, but I left a couple of inline comments - certainly not expecting you to do the work, but wanted to capture some thoughts, and at least contribute a little to your awesome work! I love the script to auto-generate src/generated-vars.h
, especially the fold
function 😄 .
Regarding perfect alignment between the "fast"
and "accurate"
methods, I'm also not too fussed about some being named, and some being GMT offsets... I think a lot will be resolved if we get them both using the latest version of the timezone boundaries, as I mentioned in my comment. I have started using the latest boundary file for the "accurate" method in the update-2019a branch, but ran into an issue with overlapping timezones which I haven't yet decided how to deal with (I don't think these are an issue (yet) in the darksky boundaries - see https://github.com/darkskyapp/tz-lookup/issues/34).
I think I dotted all the i's and crossed all the t's.
data-raw
which will fetch the latest darksky stringified tree and munge it into something that can be used in Rcpp and auto-generatessrc/generated-vars.h
.NA_character_
instead ofNA