Switch prepDetections to gsub.

mhpob commented 3 years ago

Allows speed increase outlined in #38.

Note: will give the error

Warning message: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion

when a match doesn't occurr. This happened for me on a few lines where fractional seconds weren't reported for some reason.

mhpob commented 3 years ago

The issue in #37 stems from the R default of printing 0 decimal seconds. This is internally converted to a character before strsplit/gsub, and so and millisecond information is dropped while attempting to create the "frac" column. Temporarily changing the R options inside of the function seems to fix this and so should close that issue.

baktoft commented 3 years ago

Hi @mhpob, Thanks for this - much appreciated 👍

I have looked into it and it seems that the issue in #37 can be solved by changing this line in prepDetections()
detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000] to detections[, frac:= as.numeric(ts) - floor(as.numeric(ts))]

This should also cater for the cases when fractional seconds are absent and be adequately fast. Would you mind take it for a spin on your data to confirm?

I like the gsub() from #38 :-)

Thanks, \hb

mhpob commented 3 years ago

Great catch @baktoft -- that halved the time again and solved the warning on my data.

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE)
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr    min     lq      mean  median      uq     max neval
 prepDetections(vue, "vemco_vue") 26.929 27.397 28.292839 27.5914 28.2504 34.4613   100

Columns ts and epo are now carrying the millisecond information with them. This can be seen by changing the global R options:

> options(digits = 15, digits.secs = 3)
> detections
                            ts   tag            epo              frac serial
    1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193000078201294 128355
    2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.573999881744385 128371
    3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953000068664551 128959
    4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888000011444092 128344
    5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.450999975204468 128370
   ---                                                                      
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.707999944686890 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.789999961853027 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420000076293945 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.806999921798706 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.724999904632568 128369

Is that something you would want to keep, or should I change things to strip that information? I had stripped it in https://github.com/baktoft/yaps/pull/39/commits/464e3f2d3bf3846e04dcd3b0468b49fed277d5bd -- it does take a little longer there since that method has to convert to list time, then back to calendar time.

baktoft commented 3 years ago

Hmmm - it should be ok to truncate the fractional seconds from those columns. Something like this should be ok'ish in terms of cpu-time. detections[, ts := as.POSIXct(floor(as.numeric(ts)), origin="1970-01-01", tz="UTC")] detections[, epo := floor(epo)] Thanks, \hb

mhpob commented 3 years ago

Original result and time taken:

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> prepDetections_original <- function(raw_dat, type){
+   detections <- data.table::data.table()
+   if (type == "vemco_vue"){
+     detections[, ts:=as.POSIXct(raw_dat$'Date and Time (UTC)', tz="UTC")]
+     detections[, tag:=as.numeric(sapply(raw_dat$Transmitter, function(x) strsplit(x, "-")[[1]][3]))]
+     detections[, epo:=as.numeric(ts)]
+     detections[, frac:= (as.numeric(sapply(raw_dat$'Date and Time (UTC)', function(x) strsplit(x, "\\.")[[1]][2]))) / 1000]
+     detections[, serial:=as.numeric(sapply(raw_dat$Receiver, function(x) strsplit(x, "-")[[1]][2]))]
+   }
+   detections[]
+   return(detections)
+ }
> options(digits = 15, digits.secs = 3)
> prepDetections_original(vue, 'vemco_vue')
                            ts   tag            epo  frac serial
    1: 2019-09-09 16:04:11.193 59335 1568045051.193 0.193 128355
    2: 2019-09-09 16:04:12.573 59336 1568045052.574 0.574 128371
    3: 2019-09-09 16:04:43.953 59335 1568045083.953 0.953 128959
    4: 2019-09-09 16:05:14.888 59335 1568045114.888 0.888 128344
    5: 2019-09-09 16:05:26.450 59335 1568045126.451 0.451 128370
   ---                                                          
15369: 2019-09-10 12:59:12.707 59336 1568120352.708 0.708 128369
15370: 2019-09-10 12:59:12.789 59336 1568120352.790 0.790 128973
15371: 2019-09-10 12:59:32.420 59336 1568120372.420 0.420 128371
15372: 2019-09-10 13:02:55.806 59335 1568120575.807 0.807 135178
15373: 2019-09-10 13:02:56.724 59335 1568120576.725 0.725 128369
> microbenchmark::microbenchmark(prepDetections_original(vue, 'vemco_vue'))
Unit: milliseconds
                                      expr      min        lq       mean   median       uq      max neval
 prepDetections_original(vue, "vemco_vue") 287.4089 308.50835 319.939158 318.6149 329.1724 368.4272   100

Current version using tz = '', default in data.table::fread < v1.14.0:

> vue <- data.table::fread(fn, fill=TRUE, tz = '')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
                        ts   tag        epo  frac serial
    1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
    2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
    3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
    4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
    5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
   ---                                                  
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr     min      lq      mean   median       uq      max neval
 prepDetections(vue, "vemco_vue") 43.2651 43.8835 46.724863 44.50235 47.03805 165.2624   100

Current version when tz is given as UTC in data.table::fread (default in data.table version >=1.14.0):

> fn <- system.file("extdata", "VUE_Export_ssu1.csv", package="yaps")
> vue <- data.table::fread(fn, fill=TRUE, tz = 'UTC')
> options(digits = 15, digits.secs = 3)
> prepDetections(vue, 'vemco_vue')
                        ts   tag        epo  frac serial
    1: 2019-09-09 16:04:11 59335 1568045051 0.193 128355
    2: 2019-09-09 16:04:12 59336 1568045052 0.574 128371
    3: 2019-09-09 16:04:43 59335 1568045083 0.953 128959
    4: 2019-09-09 16:05:14 59335 1568045114 0.888 128344
    5: 2019-09-09 16:05:26 59335 1568045126 0.451 128370
   ---                                                  
15369: 2019-09-10 12:59:12 59336 1568120352 0.708 128369
15370: 2019-09-10 12:59:12 59336 1568120352 0.790 128973
15371: 2019-09-10 12:59:32 59336 1568120372 0.420 128371
15372: 2019-09-10 13:02:55 59335 1568120575 0.807 135178
15373: 2019-09-10 13:02:56 59335 1568120576 0.725 128369
> microbenchmark::microbenchmark(prepDetections(vue, 'vemco_vue'))
Unit: milliseconds
                             expr    min      lq      mean  median      uq     max neval
 prepDetections(vue, "vemco_vue") 28.567 29.0313 30.529318 29.6492 31.4714 38.9802   100

mhpob commented 3 years ago

I'm now noticing that there are some floating point rounding issues (see rows 2 and 5 in the original result, e.g.). This was also the behavior in the original version of prepDetections, only now the code explicitly covers them up as the input time stamp is not carried through (rounding of ts).

In your experience, does a thousandth of a second matter... especially since the clocks can drift on a much larger scale than that?

baktoft commented 2 years ago

Sorry - got side-tracked, but are now doing a bit of end-of-year-cleaning. 1/1000 of seconds can definitely matter in estimating positions (1/1000 ~ 1.5 meters), but the temporal resolution of systems yielding data to use with this function is only 1/1000 s, so the rounding will not be an issues here.

baktoft / yaps

Switch prepDetections to gsub. #39