Choi algorithm failing due to timestamps, but they are correct #12

Open muschellij2 opened 3 months ago

muschellij2 commented 3 months ago

Here is a reprex of the issues that are occurring due to specific storage types that seems to be a "bug"

#> Using libcurl 8.4.0 with LibreSSL/3.3.6
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

Downloading and reading in the data

gt3x_file = tempfile(fileext = ".gt3x")
# https://github.com/muschellij2/Wrist-Worn-Accelerometry-Processing-Pipeline
url = "https://figshare.com/ndownloader/files/47702005"
curl::curl_download(url = url, destfile = gt3x_file)

df = read.gt3x::read.gt3x(path = gt3x_file, 
                          asDataFrame = TRUE, 
                          imputeZeroes = TRUE)
#> Sampling Rate: 30Hz
#> Firmware Version: 1.9.2
#> Serial Number Prefix: MOS
#>                  time     X     Y      Z
#> 1 2017-10-30 15:00:00 0.188 0.145 -0.984
#> 2 2017-10-30 15:00:00 0.180 0.125 -0.988
#> 3 2017-10-30 15:00:00 0.184 0.121 -0.984
#> 4 2017-10-30 15:00:00 0.184 0.121 -0.992
#> 5 2017-10-30 15:00:00 0.184 0.117 -0.988
#> 6 2017-10-30 15:00:00 0.184 0.125 -0.988

Here we are using last observation carried forward to match idle sleep mode

# impute idle sleep mode
sample_rate = attr(df, "sample_rate")
acceleration_max = as.numeric(attr(df, "acceleration_max"))
df = dplyr::as_tibble(df)
df = df %>% 
  # find where all zeroes/imputed zeroes
  mutate(all_zero = X == 0 & Y == 0 & Z == 0) %>% 
  # replace all 0 with NA so it can be filled  
    X = ifelse(all_zero, NA_real_, X),
    Y = ifelse(all_zero, NA_real_, Y),
    Z = ifelse(all_zero, NA_real_, Z)
#> [1] TRUE

Filling in the data

df = df %>% 
  select(-all_zero) %>% 
  tidyr::fill(X, Y, Z, .direction = "down")
#> # A tibble: 6 × 4
#>   time                    X     Y      Z
#>   <dttm>              <dbl> <dbl>  <dbl>
#> 1 2017-10-30 15:00:00 0.188 0.145 -0.984
#> 2 2017-10-30 15:00:00 0.18  0.125 -0.988
#> 3 2017-10-30 15:00:00 0.184 0.121 -0.984
#> 4 2017-10-30 15:00:00 0.184 0.121 -0.992
#> 5 2017-10-30 15:00:00 0.184 0.117 -0.988
#> 6 2017-10-30 15:00:00 0.184 0.125 -0.988

Getting activity counts

ac60 = df %>% 
  agcounts::calculate_counts(epoch = 60L)


# needed for `actigraph.sleepr`
ac60 = ac60 %>%
  rename(timestamp = time)
choi_nonwear = actigraph.sleepr::apply_choi(ac60)
#> Error: Missing timestamps. Epochs should be evenly spaced from first(timestamp) to last(timestamp).

We dig a bit and see that has_missing_epochs is coming up TRUE, but this seems to be a bug

# error happens at actigraph.sleepr:::check_no_missing_timestamps
#> [1] TRUE

Taking the code from has_missing_epochs_ and running it shows that it fails on identical:

# fais at actigraph.sleepr:::has_missing_epochs_
epoch_len <- get_epoch_length(ac60)
epochs <- seq(first(ac60$timestamp), last(ac60$timestamp), 
              by = epoch_len)
identical(epochs, ac60$timestamp)
#> [1] FALSE

But if we use all.equal we see that this returns TRUE

all.equal(epochs, ac60$timestamp)
#> [1] TRUE

And if we try == equality we see a TRUE

all(epochs == ac60$timestamp)
#> [1] TRUE
#> $class
#> [1] "POSIXct" "POSIXt" 
#> $tzone
#> [1] "UTC"
#> $class
#> [1] "POSIXct" "POSIXt" 
#> $tzone
#> [1] "UTC"

There are no different attributes and conversion to numeric we get:

identical(as.numeric(epochs), as.numeric(ac60$timestamp))

We see the issue is with their type, which seems relatively minor and a byproduct

#> [1] "integer"
#> [1] "double"

And this discrepancy is potentially coming from the epoch_len, but definitely comes from the dplyr::first and dplyr::last:

#> [1] "integer"
#> [1] "double"

muschellij2 commented 3 months ago

I've tracked the issue down to vctrs:vec_slice: https://github.com/r-lib/vctrs/issues/1781. Workaround, must use typeof(timestamp_column) = "double" and rerun