Closed mhpob closed 2 years ago
The issue in #37 stems from the R default of printing 0 decimal seconds. This is internally converted to a character before strsplit/gsub, and so and millisecond information is dropped while attempting to create the "frac" column. Temporarily changing the R options inside of the function seems to fix this and so should close that issue.
Hi @mhpob, Thanks for this - much appreciated 👍
I have looked into it and it seems that the issue in #37 can be solved by changing this line in prepDetections()
detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000]
to
detections[, frac:= as.numeric(ts) - floor(as.numeric(ts))]
This should also cater for the cases when fractional seconds are absent and be adequately fast. Would you mind take it for a spin on your data to confirm?
I like the gsub()
from #38 :-)
Thanks, \hb
Great catch @baktoft -- that halved the time again and solved the warning on my data.
> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE)
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
expr min lq mean median uq max neval
prepDetections(vue, "vemco_vue") 26.929 27.397 28.292839 27.5914 28.2504 34.4613 100
Columns ts
and epo
are now carrying the millisecond information with them. This can be seen by changing the global R options:
> options(digits = 15, digits.secs = 3)
> detections
ts tag epo frac serial
1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193000078201294 128355
2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.573999881744385 128371
3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953000068664551 128959
4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888000011444092 128344
5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.450999975204468 128370
---
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.707999944686890 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.789999961853027 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420000076293945 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.806999921798706 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.724999904632568 128369
Is that something you would want to keep, or should I change things to strip that information? I had stripped it in https://github.com/baktoft/yaps/pull/39/commits/464e3f2d3bf3846e04dcd3b0468b49fed277d5bd -- it does take a little longer there since that method has to convert to list time, then back to calendar time.
Hmmm - it should be ok to truncate the fractional seconds from those columns. Something like this should be ok'ish in terms of cpu-time.
detections[, ts := as.POSIXct(floor(as.numeric(ts)), origin="1970-01-01", tz="UTC")]
detections[, epo := floor(epo)]
Thanks,
\hb
Original result and time taken:
> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> prepDetections_original <- function(raw_dat, type){
+ detections <- data.table::data.table()
+ if (type == "vemco_vue"){
+ detections[, ts:=as.POSIXct(raw_dat$'Date and Time (UTC)', tz="UTC")]
+ detections[, tag:=as.numeric(sapply(raw_dat$Transmitter, function(x) strsplit(x, "-")[[1]][3]))]
+ detections[, epo:=as.numeric(ts)]
+ detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000]
+ detections[, serial:=as.numeric(sapply(raw_dat$Receiver, function(x) strsplit(x, "-")[[1]][2]))]
+ }
+ detections[]
+ return(detections)
+ }
> options(digits = 15, digits.secs = 3)
> prepDetections_original(vue, 'vemco_vue')
ts tag epo frac serial
1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193 128355
2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.574 128371
3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953 128959
4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888 128344
5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.451 128370
---
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.708 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.790 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.807 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.725 128369
> microbenchmark::microbenchmark(prepDetections_original(vue, 'vemco_vue'))
Unit: milliseconds
expr min lq mean median uq max neval
prepDetections_original(vue, "vemco_vue") 287.4089 308.50835 319.939158 318.6149 329.1724 368.4272 100
Current version using tz = ''
, default in data.table::fread
< v1.14.0:
> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
ts tag epo frac serial
1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
---
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
expr min lq mean median uq max neval
prepDetections(vue, "vemco_vue") 43.2651 43.8835 46.724863 44.50235 47.03805 165.2624 100
Current version when tz
is given as UTC in data.table::fread
(default in data.table version >=1.14.0):
> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = 'UTC')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
ts tag epo frac serial
1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
---
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
expr min lq mean median uq max neval
prepDetections(vue, "vemco_vue") 28.567 29.0313 30.529318 29.6492 31.4714 38.9802 100
I'm now noticing that there are some floating point rounding issues (see rows 2 and 5 in the original result, e.g.). This was also the behavior in the original version of prepDetections
, only now the code explicitly covers them up as the input time stamp is not carried through (rounding of ts
).
In your experience, does a thousandth of a second matter... especially since the clocks can drift on a much larger scale than that?
Sorry - got side-tracked, but are now doing a bit of end-of-year-cleaning. 1/1000 of seconds can definitely matter in estimating positions (1/1000 ~ 1.5 meters), but the temporal resolution of systems yielding data to use with this function is only 1/1000 s, so the rounding will not be an issues here.
Allows speed increase outlined in #38.
Note: will give the error
when a match doesn't occurr. This happened for me on a few lines where fractional seconds weren't reported for some reason.