16EAGLE / moveVis

An R package providing tools to visualize movement data (e.g. from GPS tracking) and temporal changes of environmental data (e.g. from remote sensing) by creating video animations.
http://www.movevis.org
GNU General Public License v3.0
131 stars 20 forks source link

add_timestamps error - timestamps length different from frames length #47

Open t-stratmann opened 5 years ago

t-stratmann commented 5 years ago

Hi Jakob,

Great package and improvements! I have a quick question.

I am creating an animation that looks at multiple individuals that have different start and stop times.

When I run the add_timestamps() function I get the error "Error: Unique timestamps of 'm' must be of same length as 'frames'. Do only use the same move or moveStack object that you have used to create 'frames'."

Yet I have given the add_timestamps() function my move object created by align_move() (called 's.align') which I use to create frames.

The problem is:

length(frames) [1] 10702 length(unique(s.align@timestamps)) [1] 10703

I have tried this with a different species, also with multiple individuals, and get the same error. I am not sure how to troubleshoot this because I cannot look into what data is in the frames and thus which timestamp is not being plotted. Any ideas on what might be wrong here?

Thanks!

Theresa

16EAGLE commented 5 years ago

Hi Theresa, thanks for reporting this problem, could you provide me with a reproducible example (e.g. just a little subset of your data, with which this occurs)? That would make it much easier for me to track down, why a timestamp seems to be left out.

t-stratmann commented 5 years ago

Hi Jakob, I tried reducing the data set to reproduce the problem, but this does not work, so I attached the entire data set. jackal_data.txt

j <- read.table("C:/.../jackal_data.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)

j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek")

j.move <- df2move(df = j, proj = CRS("+init=epsg:4326"), x = "lon", y = "lat", time = "timestamp", track_id = "ID")

j.align <- align_move(j.move, res = 1, unit = "hours")

#Get unique colors ----------- n <- 15 color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)] unique.colors <- sample(color, n) pie(rep(1,n), col=unique.colors)

frames <- frames_spatial(j.align, path_colours = unique.colors, map_service = "osm", map_type = "watercolor", alpha = 0.5)

frames <- add_timestamps(frames, j.align, type = "label") # add timestamps

I am also more than happy to help troubleshooting if you have an idea where the problem might be.

Thanks!

Theresa

16EAGLE commented 5 years ago

Hi Theresa,

this one took me some while to find out what's going on. In short: The reason for this is a strange R behavior that I have not fully understand yet related to the definition of time zones in POSIXct objects. It appears that your selected time zone "Africa/Windhoek" results in the time zone designation "CAT" for each timestamp, except for two timestamps, there the time zone designation "WAT" is assigned. I could not figure out why this happens.

The effect: You have two duplicated time stamps but they are not recognized as such since they differ in the time zone designation. As a result, they these positions are assigned to the same frames (moveVis does not check the time zone, it assumes you provide a uniform time zone), thus the number of frames is shorter by a number of two.

The simplest solution for now: If you provide your timestamps as UTC (j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "UTC") or any other time zone that do not cause differing designations, the problem disappears. I will update moveVis to avoid that this difference in time zone designation results in "merged" frames...

I'll go through how I tracked down the issue, in case you want to check it yourself. First, your code loading the data, assigning the time zone and creating the frames:

library(move)
library(moveVis)
library(lubridate)

j <- read.table("/home/jas24ny/Downloads//jackal_data.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)
j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek")

j.move <- df2move(df = j, proj = CRS("+init=epsg:4326"), x = "lon", y = "lat", time = "timestamp", track_id = "ID")
j.align <- align_move(j.move, res = 1, unit = "hours")

#Get unique colors -----------
n <- 15
color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
unique.colors <- sample(color, n)
pie(rep(1,n), col=unique.colors)

frames <- frames_spatial(j.align, path_colours = unique.colors, map_service = "osm", map_type = "watercolor", alpha = 0.5)
frames <- add_timestamps(frames, j.align, type = "label")
# Error: Unique timestamps of 'm' must be of same length as 'frames'. Do only use the same move or moveStack object that you have used to create 'frames'.

As you described, the number of frames is to short (strangely):

length(frames)
[1] 14589
length(unique(timestamps(j.align)))
[1] 14591

moveVis has an internal function that can convert your move object into a data.frame containing all positions of each track, their times and their assigned frames, which we can use for diagnostics:

m.df <- moveVis:::.m2df(j.align)
head(m.df) # this is how it looks like
                  x         y id                time            time_chr     name frame colour
jackal.1.1 15.77920 -19.05179  1 2009-02-07 02:00:00 2009-02-07 02:00:00 jackal.1     1    red
jackal.1.2 15.77918 -19.05181  1 2009-02-07 03:00:00 2009-02-07 03:00:00 jackal.1     2    red
jackal.1.3 15.78088 -19.04665  1 2009-02-07 04:00:00 2009-02-07 04:00:00 jackal.1     3    red
jackal.1.4 15.78772 -19.04502  1 2009-02-07 05:00:00 2009-02-07 05:00:00 jackal.1     4    red
jackal.1.5 15.78716 -19.04514  1 2009-02-07 06:00:00 2009-02-07 06:00:00 jackal.1     5    red
jackal.1.6 15.78101 -19.03583  1 2009-02-07 07:00:00 2009-02-07 07:00:00 jackal.1     6    red

m.df contains all positions, but we just want the rows of unique times:

frames.df <- m.df[!duplicated(m.df$time),]
# the number of rows in this data.frame is now equal to the number of timestamps in j.align
nrow(frames.df)
[1] 14591
length(unique(timestamps(j.align)))
[1] 14591
# ... but the maximum number of frames is not:
max(frames.df$frame)
[1] 14589

# lets check for duplicates in the frame number
d.ftime <- which(duplicated(frames.df$frame) == T)
[1]  1369 10105
print(d.ftime) # we have two cases in which one frame represents two times....
frames.df[d.ftime,] # namely here
                     x         y id                time            time_chr     name frame colour
jackal.1.1229 15.91653 -19.17140  1 2009-04-05 01:00:00 2009-04-05 01:00:00 jackal.1  1368    red
jackal.2.7464 15.85601 -19.08551  2 2010-04-04 01:00:00 2010-04-04 01:00:00 jackal.2 10103  green

# the frame of each previous row is identical:
identical(frames.df[d.ftime,]$frame, frames.df[d.ftime-1,]$frame)
[1] TRUE

# when checking the time of the duplicates, it becomes clear why these duplicates where "hidden":
rows.dupl <- lapply(d.ftime, function(y) frames.df[frames.df$frame == y,])
lapply(rows.dupl, function(x)  x$time)
[[1]]
[1] "2009-04-05 02:00:00 WAT"

[[2]]
[1] "2010-04-04 03:00:00 WAT"
# they are designated with WAT instead of CAT like the rest of the data
# For now, I could not find out, why these two timestamps do not recieve the same designations as the other timestamps

# you can find this in the input data:
sort(unique(timestamps(j.align)))[d.ftime] # both WAT
[1] "2009-04-05 01:00:00 WAT" "2010-04-04 01:00:00 WAT"
sort(unique(timestamps(j.align)))[d.ftime-1] # those befor CAT
[1] "2009-04-05 01:00:00 CAT" "2010-04-04 01:00:00 CAT"
tz(sort(unique(timestamps(j.align)))[d.ftime])
[1] "Africa/Windhoek"
tz(sort(unique(timestamps(j.align)))[d.ftime-1]) # but time zone is equal
[1] "Africa/Windhoek"
t-stratmann commented 5 years ago

Ahhhhh... thank-you so much for digging into this! I wasn't quiet sure how the frames and add_timestamp functions go through the data, but this code helps. Before I sent you the data, I specifically went through to remove duplicate timestamps... but if the time zone changes, then my code won't catch that. We have also noticed that depending on the computer you use you get the "WAT", "WAST", or "CAT" designation... but this is not your problem. Usually converting to a move object is a good check for duplicate timestamps, but this case is weird. Thank-you so much and thanks for the great package!

t-stratmann commented 5 years ago

Maybe the align_move() function is doing something weird? ...

The move package has the getDuplicatedTimestamps() function. And I noticed:

When you look at the raw data:

j <- read.table("C:/Users/tstratmann/Documents/Teaching/Masters Modul 2019/Movement Ecology/animations/jackal_data.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)

j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek")

getDuplicatedTimestamps(x=as.factor(j$ID), timestamps=j$timestamp, sensorType=rep("GPS", length(j$timestamp)))

It finds no duplicates.

So we use the moveVis package to convert to a move object: j.moveVis <- df2move(df = j, proj = CRS("+init=epsg:4326"), x = "lon", y = "lat", time = "timestamp", track_id = "ID") m.df.b <- moveVis:::.m2df(j.moveVis) getDuplicatedTimestamps(x=as.factor(m.df.b$id), timestamps=m.df.b$time, sensorType=rep("GPS", length(m.df.b$time)))

Again, it finds no duplicates. But then when we do:

j.align <- align_move(j.moveVis, res = 1, unit = "hours") m.df <- moveVis:::.m2df(j.align) getDuplicatedTimestamps(x=as.factor(m.df$id), timestamps=m.df$time, sensorType=rep("GPS", length(m.df$time)))

Now we get duplicate time stamps... it is always when daylight saving time occurs. These are actually different locations because in UTC they were different times.

j[c((which(j$timestamp == as.POSIXct("2009-04-05 01:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek"))-2): (which(j$timestamp == as.POSIXct("2009-04-05 01:00:00", format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek"))+2)),]

16EAGLE commented 5 years ago

The good thing is that I never stop learning new things in R... I just checked this again and these are my thoughts so far about what I found out:

When you encode your timestamps with j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek"), all timestamps in the winter time period are encoded as "CAT" (Central African Time, UTC+2) and those in the summer time period as "WAT" (Western African Time, UTC+1):

library(move)
library(moveVis)
library(lubridate)

j <- read.table("/home/UNI-WUERZBURG.EU/jas24nx/Documents/wd_work/dev/moveVis/user_data/jackal_data.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)
x <- sort(j$timestamp)[9675:9695] # period in question
x.POSIX <- as.POSIXct(x, tz = "Africa/Windhoek")
 [1] "2009-04-05 00:20:00 CAT" "2009-04-05 00:27:00 CAT" "2009-04-05 00:50:00 CAT" "2009-04-05 00:57:00 CAT" "2009-04-05 00:59:00 CAT"
 [6] "2009-04-05 01:00:00 CAT" "2009-04-05 01:03:00 CAT" "2009-04-05 01:19:00 CAT" "2009-04-05 01:20:00 CAT" "2009-04-05 01:27:00 CAT"
[11] "2009-04-05 01:50:00 CAT" "2009-04-05 01:57:00 CAT" "2009-04-05 01:59:00 CAT" "2009-04-05 02:00:00 WAT" "2009-04-05 02:03:00 WAT"
[16] "2009-04-05 02:19:00 WAT" "2009-04-05 02:20:00 WAT" "2009-04-05 02:27:00 WAT" "2009-04-05 02:50:00 WAT" "2009-04-05 02:57:00 WAT"
[21] "2009-04-05 02:59:00 WAT"

The converted timestamp imply that you have a one hour gap between the last winter timestamp [13] and the first summer timestamp [14], since [14] would need to be "2009-04-05 01:00:00 WAT due to the time change.

This means: If your timestamps were recorded continuously without taking the winter/summer time change into account (which the original timestamps indicate, but I don't know), then the numbers of this conversion are actually correct, but the designations not, since they imply a 1 hour gap. Or the timestamps were recorded taking the time change into account and there actually was no gap, then the conversion is simply incorrect, but also the data do not indicate summer/winter time.

When you create a POSIX sequence from the first element of `x.POSIX``to the last, you get this:

y.POSIX <- seq.POSIXt(min(x.POSIX), max(x.POSIX), by = 60*5) #by 5 minutes
 [1] "2009-04-05 00:20:00 CAT" "2009-04-05 00:25:00 CAT" "2009-04-05 00:30:00 CAT" "2009-04-05 00:35:00 CAT" "2009-04-05 00:40:00 CAT"
 [6] "2009-04-05 00:45:00 CAT" "2009-04-05 00:50:00 CAT" "2009-04-05 00:55:00 CAT" "2009-04-05 01:00:00 CAT" "2009-04-05 01:05:00 CAT"
[11] "2009-04-05 01:10:00 CAT" "2009-04-05 01:15:00 CAT" "2009-04-05 01:20:00 CAT" "2009-04-05 01:25:00 CAT" "2009-04-05 01:30:00 CAT"
[16] "2009-04-05 01:35:00 CAT" "2009-04-05 01:40:00 CAT" "2009-04-05 01:45:00 CAT" "2009-04-05 01:50:00 CAT" "2009-04-05 01:55:00 CAT"
[21] "2009-04-05 01:00:00 WAT" "2009-04-05 01:05:00 WAT" "2009-04-05 01:10:00 WAT" "2009-04-05 01:15:00 WAT" "2009-04-05 01:20:00 WAT"
[26] "2009-04-05 01:25:00 WAT" "2009-04-05 01:30:00 WAT" "2009-04-05 01:35:00 WAT" "2009-04-05 01:40:00 WAT" "2009-04-05 01:45:00 WAT"
[31] "2009-04-05 01:50:00 WAT" "2009-04-05 01:55:00 WAT" "2009-04-05 02:00:00 WAT" "2009-04-05 02:05:00 WAT" "2009-04-05 02:10:00 WAT"
[36] "2009-04-05 02:15:00 WAT" "2009-04-05 02:20:00 WAT" "2009-04-05 02:25:00 WAT" "2009-04-05 02:30:00 WAT" "2009-04-05 02:35:00 WAT"
[41] "2009-04-05 02:40:00 WAT" "2009-04-05 02:45:00 WAT" "2009-04-05 02:50:00 WAT" "2009-04-05 02:55:00 WAT"

After CAT changes to WAT at 02:00:00, time continuous at 01:00:00 due to the change (which by the definition of CAT and WAT is the correct sequence).

align_move() is using seq.POSIXt to create the uniform time sequence, thus the "missing" hour in your converted timestamps is included in the newly created sequence. Encoding remains CAT for winter and WAT for summer time. move::getDuplicatedTimestamps seems not to check for the differing timezones used by R to represent summer and winter time here and returns the overlapping timestamps as duplicates (and moveVis also does not account for the case that timestamps contain two time zones, which caused the merged frames)...

t-stratmann commented 5 years ago

Thanks for continuing to look at this! I guess what confuses me is that getDuplicatedTimestamps() is on to something. Duplicates are not introduced until the align_move() function happens.

The above example confuses me, because you go from pre-midnight to post-midnight and that date does not change. I think in what you show above, and what I showed previously, data from a lot of different individuals is mixed up.

The data is recorded at hourly intervals. Of course for each individual the hour is not perfectly 01:00:00.

So here perhaps a better example:

I have added the raw data which is in UTC. jackal_data_raw.txt

jackals <- read.table("C:/Users/... /jackal_data_raw.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)

jackals$timestamp <- as.POSIXct(jackals$timestamp , format = "%Y-%m-%d %H:%M:%S", tz = "UTC")

dst <- jackals[which(jackals$date == "2009-04-05" | jackals$date == "2009-04-06"),]
dst.ordered <- dst[order(dst$ID, dst$timestamp),]

dst.ordered[1:100,c(1,6)]
dst.ordered[1:100,c(1,6)]$timestamp

dst.africa.windhoek <- dst.ordered
timestamps <- as.POSIXct(dst.ordered$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
head(timestamps)
attributes(timestamps)$tzone <- "Africa/Windhoek"
head(timestamps)
dst.africa.windhoek$timestamp <- timestamps

dst.africa.windhoek[1:100,c(1,6)]
dst.africa.windhoek[1:100,c(1,6)]$timestamp

Here we can see there are no gaps in the times, even after we convert to local time.

If we convert to a move object we also get no gaps and getDuplicatedTimestamps() therefore finds nothing.

# via move

library(move)

j <- read.table("C:/Users/tstratmann/Documents/Teaching/Masters Modul 2019/Movement Ecology/animations/jackal_data.txt", header = TRUE, sep="\t", stringsAsFactors=FALSE)

j$timestamp <- as.POSIXct(j$timestamp, format = "%Y-%m-%d %H:%M:%S", tz = "Africa/Windhoek")

j.move <- move(x = j$lon, 
               y = j$lat,
               time = j$timestamp,
               proj = CRS("+init=epsg:4326"),
               animal = j$ID)

j.move.df <- as.data.frame(j.move)

getDuplicatedTimestamps(x=as.factor(j.move.df$trackId), 
                        timestamps=j.move.df$timestamps,
                        sensorType=j.move.df$sensor)

#None
j.move.df$timestamps[1:10]
j.move.df$time[1:10]

j.move <- j.move.df[which(as.Date(j.move.df$timestamp, format = "%Y-%m-%d") == as.Date("2009-04-05", format = "%Y-%m-%d") | 
                   as.Date(j.move.df$timestamp, format = "%Y-%m-%d") == as.Date("2009-04-06", format = "%Y-%m-%d")),]

j.move.ordered <- j.move[order(j.move$trackId, j.move$timestamps),]

j.move.ordered[1:100,c(8,9)]
j.move.ordered[1:100,c(8,9)]$timestamp

dim(j.move.ordered)

j.move.ordered[316:416,c(8,9)]
j.move.ordered[316:416,c(8,9)]$timestamp

Yet if we run align_move() then we get duplicate timestamps. I assume daylight savings time is hurting us, but also potentially the fact that not all individuals record at exactly the same time? I would care less about the timestamps if day/night activity wasn't of interest for us. Then we could just use UTC.