ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
30 stars 21 forks source link

Support for telemetry errors #49

Closed xhdong-umd closed 5 years ago

xhdong-umd commented 6 years ago

See discussion in #5 #28

chfleming commented 6 years ago

This stuff is not immediately pressing, because I am in the middle of working on it, but off the top of my head:

  1. Designating some imported data as calibration data and feeding that through uere().
  2. Applying a UERE estimate to telemetry data via uere()<-, thereby "calibrating" it.
  3. Fixing the default UERE in the outlier page for uncalibrated data.
  4. An easier way to turn on telemetry errors in the models than with the variogram slider.
  5. Having the option to refit the models and make sure they don't change (I need to expose some functionality for this). optim doesn't always find the best solution the first time with the error model.
xhdong-umd commented 6 years ago

Sure, I'm just putting notes that I need to change my default usage of UERE sometime. I'll wait until you are finished then we can discuss some unified strategy for errors.

chfleming commented 6 years ago

I also have full support now for plotting error circles & ellipses. Here is an example, using some data from the move package:

DATA <- as.telemetry(system.file("extdata","leroy.csv.gz",package="move"))

default plot with error rings (95%)

plot(DATA)

alternative plot with error discs

plot(DATA,error=2)

xhdong-umd commented 6 years ago

For the user input error value, I think

chfleming commented 6 years ago

The only pre-calibrated data that I've seen is ARGOS & EOBS. Users that have collected callibration data can go through an analysis in ctmm to apply the UERE to their data. It looks like this: uere(movement_data) <- uere(calibration_data)

Most functions in ctmm will only apply errors if as.logical(error_argument), including the outlie function. So if you have calibrated error, then outlie(data,UERE=TRUE) and outlie(data,UERE=10) will function the same.

xhdong-umd commented 6 years ago

assign_speeds now return a list. I assume I should take v.t as the speed value I needed since this is how outlie used, right?

I guess so, the v.dt value have one row less than the original data since it's the in-between value.

chfleming commented 6 years ago

Yes, this is how its been working since I can remember though.

v.dt are the in-between estimates and then an algorithm tailored for outlier detection is used to assign them to the individual times in v.t.

xhdong-umd commented 6 years ago

I finally got assign_speeds working now. It's a little bit slower, about 0.6 second for buffalo data (I didn't measure previous version though).

Should I also setting method = "max" or just leave it as default?

chfleming commented 6 years ago

The only difference in speed should be that the old default UERE=0 is faster than the now default UERE>0. The underlying code is pretty much the same, otherwise. Just some stuff moved around so that I could pass an error array from the top level (to prevent calculating it twice) and so that I could call the distance part of the calculation separately. The code for UERE>0 is fairly well optimized, I think. Its a vectorized solution to a transcendental equation with Bessel functions.

The default method should be selecting "max" with default match.arg as its first.

xhdong-umd commented 6 years ago

I closed this issue by mistake thought this was about the outlier calculation with error.

chfleming commented 6 years ago

@jmcalabrese @NoonanM @xhdong-umd The vignette on errors is up: https://ctmm-initiative.github.io/ctmm/articles/error.html

As far as the fitting suggestions at the end of the vignette (beyond specifying the error argument correctly), this is temporary material to overcome optimizer mistakes. I'm going to put in an extra layer in the optimization code that partitions out and fits separately/iteratively the circular parameters like ellipse orientation, because they cause massive multimodality. I'm also going to put in a layer that at automates refitting checks.

For the time being, for the webapp, I would just add a "re-fit" button that takes the previous fits and uses them as initial guesses to fit again. If you want to add feedback to that (to note if there is any change), you can calculate how much FIT$loglike increases upon refitting.

xhdong-umd commented 6 years ago

We also need to consider the task of merging multiple datasets with same individual to avoid naming conflict.

xhdong-umd commented 6 years ago

@chfleming @jmcalabrese Here is a summary of my understandings based on the error vignette:

For now I think we can add a tab(or pop up window) in data import page for error data setup, which should

chfleming commented 6 years ago
chfleming commented 6 years ago
xhdong-umd commented 6 years ago

This is my plan

For model fitting, the workflow is a little complicated

chfleming commented 6 years ago
xhdong-umd commented 6 years ago

I was working on other stuffs before checking uere()==TRUE, then I realized the mixed device problem could be quite complicated.

uere check

I want to check if data is pre-calibrated with this. uere() returned a vector with pre-calibrated data, a numeric for data with only HDOP

> uere(fisher)
horizontal      speed 
      TRUE       TRUE 
> uere(turtle)
horizontal 
  88.11638 

all(uere()) will also report TRUE on turtle data with warning, because numeric was coerced to logical. isTRUE doesn't work on vector. I can just check the first item in vector with isTRUE, but not sure if this will be correct. Maybe checking type instead of TRUE is simpler.

> is.numeric(uere(fisher))
[1] FALSE
> is.numeric(uere(turtle))
[1] TRUE

mixed device in same file

It's possible to have multiple devices in same file, though can we assume each telemetry object(each animal in same file) only have one device?

Do we have some columns for device name/id?

Is it meaningful to have uere value on a list of telemetry objects?

> uere(turtle)
horizontal 
  88.11638 
> uere(turtle[1])
horizontal 
  10.45555 
> uere(turtle[2])
horizontal 
  10.75146 
> uere(turtle[3])
horizontal 
  109.5303 

If there are both pre-calibrated device and un-calibrated device in same file, it's difficult to get a single uere value for file(or telemetry list). Should we just stick to individual uere value for each telemetry object?

I think we need to consider error per individual telemetry object from the beginning to end.

It will be quite complicated for different error of different devices in one file, or mixed devices with or without error in one file.

chfleming commented 6 years ago

For the UERE arrays, at least for the time being, you are only concerned with UERE$horizontal, which corresponds to x,y errors. Separate UERE components are used for speed and the z-axis, which aren't needed for anything in ctmmweb currently.

chfleming commented 6 years ago

For mixed devices, you just need to be able to select which devices are calibration devices (of a certain type) and which devices to apply their UERE estimate to. After calibration, everything can be thrown together.

xhdong-umd commented 6 years ago

Do we have some columns for device name/id? Or we have to rely on users to know which animal is using which device? I'm thinking if there is some information available we can show them in the data summary table so it's easier to check/apply.

I'm thinking to put "load calibration data" button inside the error plot tab. Then user can apply calibration data to selected individuals. We can add some columns of the device error information in the data summary table. What kind of information do we need for these columns? Like calibrated, uere value etc?

chfleming commented 6 years ago

Potentially, but I don't think we can bank on that. Tthe device model is more important than the specific device and I don't recall seeing device model columns in datasets, but unique devices (and rarely at that)... which is not particularly useful.

xhdong-umd commented 6 years ago

@chfleming How can I get the turtle data in .csv (or other calibration data) so I can test it with app? I knew it's in ctmm package, but I need some csv so I can test the calibration process.

xhdong-umd commented 6 years ago

I found turtle data don't have timestamp column, and my code was relying on that column to get the proper date/time of data. Should the app handle this kind of data?

chfleming commented 6 years ago

The turtle data is anonymized. You don't need to code for data without timestamp, longitude or latitude columns.

I can send you some data privately.

xhdong-umd commented 6 years ago

Can I also have the regular turtle data so that I can calibrate it with the calibration data? I think the file I received is just the two calibration tracks.

xhdong-umd commented 6 years ago

Interestingly I'm getting different results on every run of this code (the csv file is the turtle calibration data):

> library(ctmm)
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal 
  24.96594 
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal 
  28.95763 
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal 
  19.12804 
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
xhdong-umd commented 6 years ago

I'm using a column in data summary table to show the status of device error. Right now I'm checking isTRUE(ctmm::uere(tele_obj)["horizontal"]), which will return TRUE for pre-calibrated data. I was thinking this will also return TRUE after a regular un-calibrated data get calibrated, but that was not true.

I'm hoping user can see changes in the data summary table after calibration, so that it's obvious some devices are calibrated. Right now uere will return a number on un-calibrated data, and it will also return a number after data was calibrated, just with a much smaller value. Is it practical to have something to show the data status to be "calibrated"?

Another issue is that using uere<- on a list of telemetry object, which will return NULL after the update, even each item is updated.


> uere_cali <- uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> turtle <- as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtles.csv")
Minimum sampling interval of 5 hours in F13
Minimum sampling interval of 2.18 hours in F231
Minimum sampling interval of 5 hours in F235
Minimum sampling interval of 56.7 minutes in F403
Minimum sampling interval of 5 hours in F47
Minimum sampling interval of 5 hours in M270
Minimum sampling interval of 5 hours in M271
Minimum sampling interval of 4 hours in M424
Minimum sampling interval of 4 hours in M60
Minimum sampling interval of 5 hours in M85
Minimum sampling interval of 55.3 minutes in UM236
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere_cali
horizontal 
  27.01341 
> uere(turtle)
horizontal 
  2793.333 
> uere(turtle[[1]])
horizontal 
  1306.224 
> uere(turtle) <- uere_cali 
# after updating a list uere, running uere on list get NULL
> uere(turtle)
NULL
# each item in list have uere updated, just uere on list is not returning right result
> uere(turtle[[1]])
horizontal 
  27.01341 
chfleming commented 6 years ago

Sorry, I was still fiddling with the uere() code. These 2 bugs seemed to both have been fixed now.

I can add an is.calibrated method.

xhdong-umd commented 6 years ago

I'm thinking to have something to show data was calibrated. Can we add a property like 'calibrated', and it should be True in 2 cases:

The difference between a property and a 'isCalibrated' function is that I'm not sure the function can handle the 2nd case.

xhdong-umd commented 6 years ago

I saw the first bug is fixed, but I'm getting this

> library(ctmm)
> uere_cali <- uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere_cali
horizontal 
  10.62875 
> turtle <- as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtles.csv")
Minimum sampling interval of 5 hours in F13
Minimum sampling interval of 2.18 hours in F231
Minimum sampling interval of 5 hours in F235
Minimum sampling interval of 56.7 minutes in F403
Minimum sampling interval of 5 hours in F47
Minimum sampling interval of 5 hours in M270
Minimum sampling interval of 5 hours in M271
Minimum sampling interval of 4 hours in M424
Minimum sampling interval of 4 hours in M60
Minimum sampling interval of 5 hours in M85
Minimum sampling interval of 55.3 minutes in UM236
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone,  :
  HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(turtle) <- uere_cali 
> uere(turtle)
Error in apply(UERE, 1, unique) : dim(X) must have a positive length
> uere(turtle[[1]])
Error in apply(UERE, 1, unique) : dim(X) must have a positive length
chfleming commented 6 years ago

2 more bugfixes uploaded with more features to come soon.

xhdong-umd commented 6 years ago

Currently implemented features:

load pre-calibrated data

The error tab will plot the data with error disc (and other options). There is a uere column in data summary table to show uere value. It's TRUE for pre-calibrated data and was converted to 1 for now. This probably is not the best solution but I'll wait for more comments before polishing it.

screen shot 2018-06-12 at 4 07 05 pm

process calibration data

Load data as regular data, and user can take subset, filter outlier etc.

Note the uere column in data summary table.

screen shot 2018-06-12 at 4 09 51 pm

After the processing, user can select tracks in data and export as csv with the Export Current button. This way user can generate proper calibration data from regular data set.

calibrate regular data with a calibration data set

First load the regular data (note the uere column value)

screen shot 2018-06-12 at 4 12 28 pm

Then load the calibration data with this button.

screen shot 2018-06-12 at 4 13 02 pm

Once data is loaded, the UERE field will update. Pressing Apply button will apply the UERE value to current selected regular data set.

screen shot 2018-06-12 at 4 14 25 pm

And the data summary table uere value will update to show the changes.

screen shot 2018-06-12 at 4 15 29 pm

Alternatively, user can just input a value in the UERE text field and apply it. This can be used for manufacturer provided UERE value or an estimate value.

xhdong-umd commented 6 years ago

Previously we have a UERE input box in outlier page

screen shot 2018-06-12 at 4 18 27 pm

This should be removed now since we have calibration and manual input in the error tab of visualization page. The outlier page should just use the uere provided in earlier step.

I noticed that ctmm::outlie is using UERE parameter as a parameter. Given the data is pre-calibrated or calibrated with a calibration data set, what should I provide for this UERE parameter in outlier detection?

The example here didn't set UERE parameter, so it's using the default 10 even the turtle data is calibrated. I assume outlie is getting the error information from other columns, not relying on UERE parameter?

uere(turtle) <- UERE
plot(turtle[[3]],error=2) # turtle plot with 95% error discs

## ------------------------------------------------------------------------
outlie(turtle[[3]]) -> OUT

The problem here is that I'm not using outlie directly, because my distance calculation is segmented by groups, so I'm using the lower level functions, i.e get.error and distanceMLE for distance, assign_speeds for speed.

If I just use the following line with a UERE = 10 in my code, I think this will get the proper error info from pre-calibrated data and calibrated data(i.e. the UERE=10 will not override the error info present in data with a fixed value), right?

  error <- get.error(data,ctmm(error=UERE,axes=c("x","y")),circle=TRUE)

  Vs <- assign_speeds(data,UERE=error)
chfleming commented 6 years ago

Yes, if the data is precalibrated the error slot is interpreted as a logical value.

xhdong-umd commented 6 years ago

For the data summary table, we can add a column calibrated to use beside uere column. I'm not sure if single column of calibrated will be enough.

In summary, I'm hoping some values that to be device specific so user can see if some tracks are using same device, and it should change after calibration.

xhdong-umd commented 6 years ago

@chfleming For outlier detection, providing a uere=10 or uere=TRUE will be OK for pre-calibrated data. I think you said it's also OK to use uere=10 for un-calibrated data?

xhdong-umd commented 6 years ago

I have removed the UERE input box in outlier page, and make outlier code to use the proper error information. Next step will be the modeling page.

chfleming commented 6 years ago

uere=10 is a reasonable default for uncalibrated GPS data, for plots and such.

xhdong-umd commented 6 years ago

But there are also data from VHF etc, right? Should we don't turn on error for uncalibrated data by default? User can always input an estimated value in error tab.

If so, what uere value should I provide to the get.error function?

xhdong-umd commented 6 years ago

@chfleming Mike mentioned that using control=list(method='pNewton',cores=1) sometimes can fail and we need to fall back to default method.

How do we detect the fail? What will be the error and return value look like? I need to be able to identify the failure then fall back. Do you have some code example, or have some condition to identify the failure?

NoonanM commented 6 years ago

This is the tryCatch I usually use.

mods <- tryCatch(
  {

    #First try fitting the models with the pNewton optimizer
    mods <- ctmm.select(data,
                             CTMM = GUESS,
                             control=list(method="pNewton", cores = 1))

  }, error=function(err) {
  #If it fails, revert to nelder-mead
    message(paste("ctmm.select() failed with pNewton, switching to Nelder-Mead"))

   mods <- ctmm.select(cilla,
                                        CTMM = GUESS)

    return(mods)

  }
)
xhdong-umd commented 6 years ago

For vaiogram plot, it seemed that we can only plot one model/guess now. Do you think it'll be useful to overlay mutiple models? That way we can show the difference between original model parameters and the changed model more clearly.

xhdong-umd commented 6 years ago

Interestingly, the tryCatch code worked in script but failed in app. I changed to try and that worked. I think when error happened in tryCatch, there could be multiple error handlers, maybe other error handler processed the error first and failed the code. The parallel code also make the error handling more difficult.

xhdong-umd commented 6 years ago

I guess control=list(method="pNewton", cores = -1) will use all the cores except 1 to fit the model? I added code to use most cores if there is only one animal to fit, though I didn't see obvious time difference between two methods.

running parallel with mclapply in cluster of 1 
* Fitting models OUF anisotropic
Maximizing likelihood.
Calculating Hessian.
   user  system elapsed 
 38.169   0.206  38.412 

trying models on single animal with multiple cores
* Fitting models OUF anisotropic
Maximizing likelihood.
Calculating Hessian.
   user  system elapsed 
 96.948  37.218  33.056 
xhdong-umd commented 6 years ago

@chfleming @NoonanM For the workflow of model fitting

  1. when the 3. model tab is activated, the app will try models on current subset with ctmm.select.

    If data is calibrated, should we use CTMM = ctmm(error = FALSE) by default? I think the help document suggested to fit without error first to provide a 1st pass result as new initial values.

  2. after the 1st pass, user can use drop down list to select an animal, use sliders to adjust. The error slider will become [0, 1] with step of 1 (so it can only be either 0 or 1). After the adjustment, the slider values will be used to construct a CTMM object used for 2nd pass of ctmm.select.

    User can either click a button to fit it directly, or save the changes (and the changes should be shown in model summary table), work on other animals, and in the end click a button to fit all changed animals.

chfleming commented 6 years ago

Sorry, I am just catching up:

chfleming commented 6 years ago

ctmm() by default is ctmm(error=FALSE) and I think its the appropriate default given the extra work necessary for fitting error and the fact that it isn't necessary in many cases.

I would give the options to both fit first without error (for cases where the errors are small and the time differences are not too small) and alternatively to first fit with isotropic=TRUE.

xhdong-umd commented 6 years ago

So we will fit without error by default even with data is calibrated. And add a checkbox to fit with isotropic=TRUE.

I'll look at the options to draw multiple fits in variogram plot. I think we probably need some checkbox to turn them on and off in case the curves are too similar and hard to recognize. There are multiple approaches to implement this, and the internal workflow can become quite complicated.

xhdong-umd commented 6 years ago

If I have multiple ctmm objects (the guesstimate, the model result, the adjusted values from sliders), I plan to show them in a table so that the difference is obvious. Is there any method to get a summary from ctmm objects?

For example, we have some slider values initialized from the guesstimate ctmm object. Or maybe we just take them from result of variogram.fit.backend?

I saw there are many values in the object, but that's different from the slider values based on the same object.

I believe this kind of table is needed so that the manual adjustment can be recorded accurately in the work report, otherwise the adjusted init values cannot be reproduced from the work report.

UPDATE It might be easier that I just print the before/after slider values in console, which will also be included in work report. This can be used for reproducing the fit from app.