Closed xhdong-umd closed 5 years ago
This stuff is not immediately pressing, because I am in the middle of working on it, but off the top of my head:
uere()
.uere()<-
, thereby "calibrating" it.optim
doesn't always find the best solution the first time with the error model.Sure, I'm just putting notes that I need to change my default usage of UERE sometime. I'll wait until you are finished then we can discuss some unified strategy for errors.
I also have full support now for plotting error circles & ellipses. Here is an example, using some data from the move
package:
DATA <- as.telemetry(system.file("extdata","leroy.csv.gz",package="move"))
default plot with error rings (95%)
plot(DATA)
alternative plot with error discs
plot(DATA,error=2)
For the user input error value, I think
The only pre-calibrated data that I've seen is ARGOS & EOBS. Users that have collected callibration data can go through an analysis in ctmm
to apply the UERE to their data. It looks like this: uere(movement_data) <- uere(calibration_data)
Most functions in ctmm will only apply errors if as.logical(error_argument)
, including the outlie
function. So if you have calibrated error, then outlie(data,UERE=TRUE)
and outlie(data,UERE=10)
will function the same.
assign_speeds
now return a list. I assume I should take v.t
as the speed value I needed since this is how outlie
used, right?
I guess so, the v.dt
value have one row less than the original data since it's the in-between value.
Yes, this is how its been working since I can remember though.
v.dt
are the in-between estimates and then an algorithm tailored for outlier detection is used to assign them to the individual times in v.t
.
I finally got assign_speeds
working now. It's a little bit slower, about 0.6 second for buffalo data (I didn't measure previous version though).
Should I also setting method = "max"
or just leave it as default?
The only difference in speed should be that the old default UERE=0
is faster than the now default UERE>0
. The underlying code is pretty much the same, otherwise. Just some stuff moved around so that I could pass an error array from the top level (to prevent calculating it twice) and so that I could call the distance part of the calculation separately. The code for UERE>0
is fairly well optimized, I think. Its a vectorized solution to a transcendental equation with Bessel functions.
The default method should be selecting "max"
with default match.arg
as its first.
I closed this issue by mistake thought this was about the outlier calculation with error.
@jmcalabrese @NoonanM @xhdong-umd The vignette on errors is up: https://ctmm-initiative.github.io/ctmm/articles/error.html
As far as the fitting suggestions at the end of the vignette (beyond specifying the error argument correctly), this is temporary material to overcome optimizer mistakes. I'm going to put in an extra layer in the optimization code that partitions out and fits separately/iteratively the circular parameters like ellipse orientation, because they cause massive multimodality. I'm also going to put in a layer that at automates refitting checks.
For the time being, for the webapp, I would just add a "re-fit" button that takes the previous fits and uses them as initial guesses to fit again. If you want to add feedback to that (to note if there is any change), you can calculate how much FIT$loglike
increases upon refitting.
We also need to consider the task of merging multiple datasets with same individual to avoid naming conflict.
@chfleming @jmcalabrese Here is a summary of my understandings based on the error vignette:
After data was imported as telemetry object, if there is var.xy
column, that means error information is available and we don't need uere
(I'm not totally sure about this). We only need to turn on error when needed.
If there is only HDOP
column, we need uere
. It can come from
uere
, but estimate with movement model, this need an initial value, like 10 for gps.Outlier detection can use error information, which should work once error data is set up.
two pass model fitting with error. There may be changes in ctmm end.
For now I think we can add a tab(or pop up window) in data import page for error data setup, which should
var.xy
column and HDOP
column if availablevar.xy
column. Calibrated ARGOS data will have (elliptical) cov.x.x
, cov.x.y
, cov.y.y
columns.HDOP
column and a constant HDOP=1
will be assumed by uere()<-
. Its not as good, but better than nothing.control=list(method='pNewton',cores=1)
in ctmm.fit
and ctmm.select
. Now the only problem I am seeing is the occasional convergence on a local maximum that is not the global maximum. This was obvious from the variogram, though. So in the webapp, I would do things slightly differently. I would just let the users readjust the sliders again after fitting (tweaking parameters of the fitted model and not the original guess) and refit the adjusted cases if they want. You would want to keep track of FIT$loglike
to see if there was any change and note that.as.telemetry()
that precalibrated data have a uere()
value of TRUE
, so you won't need to look at internal data structures for that.plot.telemetry
that can plot error circles/ellipses, with both rings and discs, where the opacity scales with 1/circumference and 1/area respectively. Something like that needs to be supported in the webapp so that users can visualize their errors if they are substantial. Tell me if there's anything I can do to make what I have more portable/general/flexible, e.g., like we did with the variogram slider functionality.This is my plan
uere()
is TRUE
in ctmm. Otherwise we may need to assign numerical values later.uere
status for current data in the app
uere
value even in case 1, 2?error
tab in visualization page to include both uere
status and the plot with error.For model fitting, the workflow is a little complicated
error=FALSE
)ctmm.guess
with the additional argument CTMM=ctmm(error=10)
. This would cover all cases but VHF data. Additionally, there could be a numerical input if we want.AICc
to show if it decreased. Previously I said slot loglike
but I realized that a bad fit could select the wrong model.I was working on other stuffs before checking uere()==TRUE, then I realized the mixed device problem could be quite complicated.
I want to check if data is pre-calibrated with this. uere() returned a vector with pre-calibrated data, a numeric for data with only HDOP
> uere(fisher)
horizontal speed
TRUE TRUE
> uere(turtle)
horizontal
88.11638
all(uere())
will also report TRUE on turtle data with warning, because numeric was coerced to logical. isTRUE
doesn't work on vector. I can just check the first item in vector with isTRUE
, but not sure if this will be correct. Maybe checking type instead of TRUE is simpler.
> is.numeric(uere(fisher))
[1] FALSE
> is.numeric(uere(turtle))
[1] TRUE
It's possible to have multiple devices in same file, though can we assume each telemetry object(each animal in same file) only have one device?
Do we have some columns for device name/id?
Is it meaningful to have uere value on a list of telemetry objects?
> uere(turtle)
horizontal
88.11638
> uere(turtle[1])
horizontal
10.45555
> uere(turtle[2])
horizontal
10.75146
> uere(turtle[3])
horizontal
109.5303
If there are both pre-calibrated device and un-calibrated device in same file, it's difficult to get a single uere value for file(or telemetry list). Should we just stick to individual uere value for each telemetry object?
I think we need to consider error per individual telemetry object from the beginning to end.
It will be quite complicated for different error of different devices in one file, or mixed devices with or without error in one file.
For the UERE arrays, at least for the time being, you are only concerned with UERE$horizontal, which corresponds to x,y errors. Separate UERE components are used for speed and the z-axis, which aren't needed for anything in ctmmweb currently.
For mixed devices, you just need to be able to select which devices are calibration devices (of a certain type) and which devices to apply their UERE estimate to. After calibration, everything can be thrown together.
Do we have some columns for device name/id? Or we have to rely on users to know which animal is using which device? I'm thinking if there is some information available we can show them in the data summary table so it's easier to check/apply.
I'm thinking to put "load calibration data" button inside the error plot tab. Then user can apply calibration data to selected individuals. We can add some columns of the device error information in the data summary table. What kind of information do we need for these columns? Like calibrated
, uere value
etc?
Potentially, but I don't think we can bank on that. Tthe device model is more important than the specific device and I don't recall seeing device model columns in datasets, but unique devices (and rarely at that)... which is not particularly useful.
@chfleming How can I get the turtle data in .csv (or other calibration data) so I can test it with app? I knew it's in ctmm package, but I need some csv so I can test the calibration process.
I found turtle data don't have timestamp
column, and my code was relying on that column to get the proper date/time of data. Should the app handle this kind of data?
The turtle data is anonymized. You don't need to code for data without timestamp, longitude or latitude columns.
I can send you some data privately.
Can I also have the regular turtle data so that I can calibrate it with the calibration data? I think the file I received is just the two calibration tracks.
Interestingly I'm getting different results on every run of this code (the csv file is the turtle calibration data):
> library(ctmm)
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal
24.96594
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal
28.95763
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
horizontal
19.12804
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
I'm using a column in data summary table to show the status of device error. Right now I'm checking isTRUE(ctmm::uere(tele_obj)["horizontal"])
, which will return TRUE
for pre-calibrated data. I was thinking this will also return TRUE
after a regular un-calibrated data get calibrated, but that was not true.
I'm hoping user can see changes in the data summary table after calibration, so that it's obvious some devices are calibrated. Right now uere
will return a number on un-calibrated data, and it will also return a number after data was calibrated, just with a much smaller value. Is it practical to have something to show the data status to be "calibrated"?
Another issue is that using uere<-
on a list of telemetry object, which will return NULL after the update, even each item is updated.
> uere_cali <- uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> turtle <- as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtles.csv")
Minimum sampling interval of 5 hours in F13
Minimum sampling interval of 2.18 hours in F231
Minimum sampling interval of 5 hours in F235
Minimum sampling interval of 56.7 minutes in F403
Minimum sampling interval of 5 hours in F47
Minimum sampling interval of 5 hours in M270
Minimum sampling interval of 5 hours in M271
Minimum sampling interval of 4 hours in M424
Minimum sampling interval of 4 hours in M60
Minimum sampling interval of 5 hours in M85
Minimum sampling interval of 55.3 minutes in UM236
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere_cali
horizontal
27.01341
> uere(turtle)
horizontal
2793.333
> uere(turtle[[1]])
horizontal
1306.224
> uere(turtle) <- uere_cali
# after updating a list uere, running uere on list get NULL
> uere(turtle)
NULL
# each item in list have uere updated, just uere on list is not returning right result
> uere(turtle[[1]])
horizontal
27.01341
Sorry, I was still fiddling with the uere()
code. These 2 bugs seemed to both have been fixed now.
I can add an is.calibrated
method.
I'm thinking to have something to show data was calibrated. Can we add a property like 'calibrated', and it should be True in 2 cases:
uere<-
is used?) The difference between a property and a 'isCalibrated' function is that I'm not sure the function can handle the 2nd case.
I saw the first bug is fixed, but I'm getting this
> library(ctmm)
> uere_cali <- uere(as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtle_cali.csv"))
Minimum sampling interval of 9.98 minutes in 60s
Minimum sampling interval of 8.87 minutes in 90s
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere_cali
horizontal
10.62875
> turtle <- as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/turtles.csv")
Minimum sampling interval of 5 hours in F13
Minimum sampling interval of 2.18 hours in F231
Minimum sampling interval of 5 hours in F235
Minimum sampling interval of 56.7 minutes in F403
Minimum sampling interval of 5 hours in F47
Minimum sampling interval of 5 hours in M270
Minimum sampling interval of 5 hours in M271
Minimum sampling interval of 4 hours in M424
Minimum sampling interval of 4 hours in M60
Minimum sampling interval of 5 hours in M85
Minimum sampling interval of 55.3 minutes in UM236
Warning message:
In as.telemetry.data.frame(data, timeformat = timeformat, timezone = timezone, :
HDOP values found, but UERE not specified and will have to be fit. See help("uere").
> uere(turtle) <- uere_cali
> uere(turtle)
Error in apply(UERE, 1, unique) : dim(X) must have a positive length
> uere(turtle[[1]])
Error in apply(UERE, 1, unique) : dim(X) must have a positive length
2 more bugfixes uploaded with more features to come soon.
Currently implemented features:
The error tab will plot the data with error disc (and other options). There is a uere column in data summary table to show uere value. It's TRUE
for pre-calibrated data and was converted to 1 for now. This probably is not the best solution but I'll wait for more comments before polishing it.
Load data as regular data, and user can take subset, filter outlier etc.
Note the uere column in data summary table.
After the processing, user can select tracks in data and export as csv with the Export Current
button. This way user can generate proper calibration data from regular data set.
First load the regular data (note the uere column value)
Then load the calibration data with this button.
Once data is loaded, the UERE field will update. Pressing Apply
button will apply the UERE value to current selected regular data set.
And the data summary table uere value will update to show the changes.
Alternatively, user can just input a value in the UERE text field and apply it. This can be used for manufacturer provided UERE value or an estimate value.
Previously we have a UERE input box in outlier page
This should be removed now since we have calibration and manual input in the error tab of visualization page. The outlier page should just use the uere provided in earlier step.
I noticed that ctmm::outlie
is using UERE
parameter as a parameter. Given the data is pre-calibrated or calibrated with a calibration data set, what should I provide for this UERE
parameter in outlier detection?
The example here didn't set UERE
parameter, so it's using the default 10 even the turtle data is calibrated. I assume outlie
is getting the error information from other columns, not relying on UERE
parameter?
uere(turtle) <- UERE
plot(turtle[[3]],error=2) # turtle plot with 95% error discs
## ------------------------------------------------------------------------
outlie(turtle[[3]]) -> OUT
The problem here is that I'm not using outlie
directly, because my distance calculation is segmented by groups, so I'm using the lower level functions, i.e get.error
and distanceMLE
for distance, assign_speeds
for speed.
If I just use the following line with a UERE = 10
in my code, I think this will get the proper error info from pre-calibrated data and calibrated data(i.e. the UERE=10
will not override the error info present in data with a fixed value), right?
error <- get.error(data,ctmm(error=UERE,axes=c("x","y")),circle=TRUE)
Vs <- assign_speeds(data,UERE=error)
Yes, if the data is precalibrated the error
slot is interpreted as a logical value.
For the data summary table, we can add a column calibrated
to use beside uere
column. I'm not sure if single column of calibrated
will be enough.
calibrated
as true
, and uere
column should be empty?calibrated
as false
, but the uere
column still have some larger values. Should uere
(or some other names) be empty in this case?uere
value to calibrate the data, can we show the difference?In summary, I'm hoping some values that to be device specific so user can see if some tracks are using same device, and it should change after calibration.
@chfleming
For outlier detection, providing a uere=10
or uere=TRUE
will be OK for pre-calibrated data.
I think you said it's also OK to use uere=10
for un-calibrated data?
I have removed the UERE input box in outlier page, and make outlier code to use the proper error information. Next step will be the modeling page.
uere=10
is a reasonable default for uncalibrated GPS data, for plots and such.
But there are also data from VHF etc, right? Should we don't turn on error for uncalibrated data by default? User can always input an estimated value in error tab.
If so, what uere value should I provide to the get.error
function?
@chfleming Mike mentioned that using control=list(method='pNewton',cores=1)
sometimes can fail and we need to fall back to default method.
How do we detect the fail? What will be the error and return value look like? I need to be able to identify the failure then fall back. Do you have some code example, or have some condition to identify the failure?
This is the tryCatch I usually use.
mods <- tryCatch(
{
#First try fitting the models with the pNewton optimizer
mods <- ctmm.select(data,
CTMM = GUESS,
control=list(method="pNewton", cores = 1))
}, error=function(err) {
#If it fails, revert to nelder-mead
message(paste("ctmm.select() failed with pNewton, switching to Nelder-Mead"))
mods <- ctmm.select(cilla,
CTMM = GUESS)
return(mods)
}
)
For vaiogram plot, it seemed that we can only plot one model/guess now. Do you think it'll be useful to overlay mutiple models? That way we can show the difference between original model parameters and the changed model more clearly.
Interestingly, the tryCatch code worked in script but failed in app. I changed to try and that worked. I think when error happened in tryCatch, there could be multiple error handlers, maybe other error handler processed the error first and failed the code. The parallel code also make the error handling more difficult.
I guess control=list(method="pNewton", cores = -1)
will use all the cores except 1 to fit the model? I added code to use most cores if there is only one animal to fit, though I didn't see obvious time difference between two methods.
running parallel with mclapply in cluster of 1
* Fitting models OUF anisotropic
Maximizing likelihood.
Calculating Hessian.
user system elapsed
38.169 0.206 38.412
trying models on single animal with multiple cores
* Fitting models OUF anisotropic
Maximizing likelihood.
Calculating Hessian.
user system elapsed
96.948 37.218 33.056
@chfleming @NoonanM For the workflow of model fitting
when the 3. model
tab is activated, the app will try models on current subset with ctmm.select
.
If data is calibrated, should we use CTMM = ctmm(error = FALSE)
by default? I think the help document suggested to fit without error first to provide a 1st pass result as new initial values.
after the 1st pass, user can use drop down list to select an animal, use sliders to adjust. The error slider will become [0, 1] with step of 1 (so it can only be either 0 or 1). After the adjustment, the slider values will be used to construct a CTMM object used for 2nd pass of ctmm.select
.
User can either click a button to fit it directly, or save the changes (and the changes should be shown in model summary table), work on other animals, and in the end click a button to fit all changed animals.
Sorry, I am just catching up:
col.CTMM
array. I would show this and the AICc difference.ctmm.fit
"Maximizing likelihood.", the second part of the calculation "Calculating Hessian." is not yet parallelized yet. The first part also gets more robust (in addition to faster) as more points are sampled with the cores.ctmm()
by default is ctmm(error=FALSE)
and I think its the appropriate default given the extra work necessary for fitting error and the fact that it isn't necessary in many cases.
I would give the options to both fit first without error (for cases where the errors are small and the time differences are not too small) and alternatively to first fit with isotropic=TRUE
.
So we will fit without error by default even with data is calibrated. And add a checkbox to fit with isotropic=TRUE
.
I'll look at the options to draw multiple fits in variogram plot. I think we probably need some checkbox to turn them on and off in case the curves are too similar and hard to recognize. There are multiple approaches to implement this, and the internal workflow can become quite complicated.
If I have multiple ctmm
objects (the guesstimate, the model result, the adjusted values from sliders), I plan to show them in a table so that the difference is obvious. Is there any method to get a summary from ctmm
objects?
For example, we have some slider values initialized from the guesstimate ctmm
object. Or maybe we just take them from result of variogram.fit.backend
?
I saw there are many values in the object, but that's different from the slider values based on the same object.
I believe this kind of table is needed so that the manual adjustment can be recorded accurately in the work report, otherwise the adjusted init values cannot be reproduced from the work report.
UPDATE It might be easier that I just print the before/after slider values in console, which will also be included in work report. This can be used for reproducing the fit from app.
See discussion in #5 #28
UERE
value is 0 in app when there is no error input by user. This should change to 10 by default.