Closed xhdong-umd closed 5 years ago
Another minor point: do you think it's a good idea to note the internal data in descriptions that if they are anonymized or calibrated?
Running the wolf data through outlie()
, I am not getting any warnings.
I ran automated fits on all 8 maned wolves (with level=1
), ran summary on the model fit list, and ran summary on each individual model fit in the list. I didn't get any errors or warnings.
Wait I forgot verbose=TRUE
... still no errors or warnings.
Sorry I will get some reproducible code tomorrow. It could be I was testing with 100 points sample. And the outlier was calculated with error on.
Here is the code to reproduce the warnings. It probably was caused by the sample which make the model less normal.
library(ctmm)
library(ctmmweb)
data(wolf)
data_sample <- pick(wolf, 100)
model_try_res <- par_try_models(data_sample)
model_list <- unlist(model_try_res, recursive = FALSE)
summary(model_list[[5]])
Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
hrange <- akde(data_sample[["Loba"]], CTMM = model_list[["Loba.OUf isotropic"]])
summary(hrange) # very large value of CI high
plot(hrange) # thus the outer contour out of range
Warning messages:
1: In CI.UD(x, max(level.UD), max(level), P = TRUE) :
Outer contour extends beyond raster.
2: In CI.UD(x[[i]], l, level, P = TRUE) :
Outer contour extends beyond raster.
It appears to be related to the OUΩ anisotropic model.
I ran the code until a I got a sample that reproduced the warning with summary()
. It was a OUΩ model, and I have fixed the warning in the master branch.
However, when I ran ctmm.select()
from the command line on the same dataset, I didn't find this OUΩ model to be selected by AIC. Is your script missing the level=1
option?
We have a model summary table in the app which will run summary on every model, so summary
will be called on it even it's not optimal. That being said, I didn't use level=1
in the model selection page.
Should I use that option?
Ah... I am missing some edges in the stepwise regression of ctmm.select
given the new OUf and OUO models. This is causing some bad OUf/OUO models selected over OUF, when the you should ultimately step down to OU or even IID. Give me a few hours to fix this.
Ok, I've fixed the other bug and am running check
now... will push to GitHub momentarily. This is a very bad bug, so I am also going to try to push to CRAN ASAP.
level=1
is safest but shouldn't be absolutely necessary unless something is very wrong with the shape of the likelihood function.
Second fix is on GitHub and pushed to CRAN.
Previously, I did not conceive of cases where OUf/OUO would be selected over OUF, yet OU would be selected over both. This is because the ordering of the timescales is like OU < OUF < OUf < OUO.
I tested with updated ctmm. There is still the warning for home range plot, but that should be expected because the contour is just too big to fit. Should we suppress that warning?
I tested all ctmm internal dataset with ctmm 0.5.4 in webapp. There is no warning or problem in all pages, though I do see the speed estimation on sampled wolf data still take a long time. With other dataset it could take about 300s, but with sampled wolf data it has been 30 mins and not finished yet.
Is this normal or something we should consider to improve?
I can generate some reproducible code if needed for this.
One of the wolf datasets is very long and then coarsening it down (which increases uncertainty in the trajectory). I can see how that would be a problematic calculation.
Is it possible to have a reasonable estimate automatically on how long the calculation will take according to the dataset? It don't have to be actual time (which is impossible depend on user's computer), just "long, short, medium" like estimations.
In principle, I could take the metric behind the progress bar and pass that to an environmental variable. Could you do something with that?
Also, with speeds()
, are you parallelizing over individuals or within the speeds()
function. I think parallelizing within the speeds()
function should be faster because it is embarrassingly parallel and the individuals may differ considerably, which is definitely the case with the wolves.
I investigated progress bar approach before. It's easy to have a progress bar in console, I'm not sure if it can update the bar in app, I'll need to look at it.
For speeds it's on individuals now. I'll try to run within speeds to see the result.
Should I use speed
or speeds
? I think the page is for average speed so I was using speed
.
speed()
I created a bad bug in summary()
when fixing the min
& max
warning errors. Now the tau CIs all run from 0
to Inf
... and I just pushed to CRAN because the other bug was so bad.
Uh-oh, that often happens... Last time I found a movebank bug which was caused by my changes to data import part to make it safer.
For speed, using parallel inside should be better. One advantage is speed
is not available for some models, so assigning cores to them will be a waste.
On sampled buffalo data, I saw speed took 3s instead of 9s. For sampled wolf data, it's still slow and I saw the progress bar didn't change on 0% after quite some time.
To make the progress value available to the app, another approach is to let your code take a progress function as parameter. You give it a console progress function, it will show progress in console. Given a web app progress function, it will show progress bar in web app. This way you don't have to expose internal progress value to some global variables.
There is probably no time to change speed part before the course, I'll work on it and put it in development version.
I'm testing all internal dataset with the newest version of ctmm. @chfleming With sampled coatis, some model get a really big speed value, is that normal?
That's totally normal for a continuous-velocity model that is not the selected model. As the data become increasingly coarse, DOF[speed]
approaches zero, and the speed estimate blows up. CIs look appropriately wide. There is almost no information in the data regarding speed. The selected model (OU) actually has infinite speed. You could replace those NA
values with Inf (0,Inf)
if you want.
Another result with gazelle, maybe also normal, just to confirm:
a big value with tao period in last one
Big home ranges:
That's normal if it isn't the selected model and the CIs are appropriately wide. That feature turns off as the period limits to Inf
. Infinite oscillation period means that it doesn't oscillate.
I'm not sure why the unit for tao is microsecond for tutle
The actual value is like this
low ML high
area (hectares) 1.135267 1.395365 1.681888
τ[position] (minutes) 0.000000 2.641263 5.904736
τ[velocity] (seconds) 0.000000 0.000000 36.829864
The ML value of τ[velocity] is very small (maybe it should be 0 but was a very small value), so the function chose the smallest unit possible, which in turn make the high value of 36 sec to be a very big value in microseconds.
I'm not sure about the details of how this happened, I need to look at code.
The unit picking function was looking at median value in a vector, then choose best unit for whole vector. The median value is almost 0 in this case (there are multiple models), so the smallest unit was chosen.
What should we do in this case?
In units
we are taking the smallest unit if the value is very small. I'm thinking maybe we should set a threshold that if even the smallest unit will make a very small value, we may as well just use SI unit and let the value become almost 0.
What's the ratio between the ML
value and high
CI?
It seemed that the ML value is just 0.
low ML high
area (hectares) 1.135267 1.395365 1.681888
τ[position] (minutes) 0.000000 2.641263 5.904736
τ[velocity] (seconds) 0.000000 0.000000 36.829864
In summary.ctmm
, if the ML
value is zero then I switch to the high
CI, like this for you:
NONZERO <- (ML > .Machine$double.eps)
if(any(NONZERO)) { TEST <- ML[NONZERO] }
else { TEST <- high }
TEST <- stats::median(TEST)
and then base the units on TEST
Actually, take out the ML/high
. That would have bad results and isn't what I do.
Do you mean just exclude ML and high value in unit calculation?
No, sorry, I edited my code. I mean what I have posted now.
OK, I'll try this option. Because I have unified function to process all columns need to be formatted with units, it's not easy to make change in current structure. The change probably will not have time to come into this release.
App and package updated to 0.2.5, and hosted app was updated with latest ctmm too.
In my code I need to format all the columns in a table, and all models need to be formatted with same unit, so I have a function to check the whole column then determine the unit. The ML/low/high values are different rows of same column at this time (later they will be reshaped to wider columns, but it's easier to deal with as rows for processing), so it's a little bit difficult to separate ML/high value here.
It can be done, just will need quite some extra structures.
I'm wondering if we can just exclude all zeros in checking unit? Zero value can be in any unit, and they really don't bring any information in determining the unit, instead they will skew the median value (make the function think the zero value is small and need smallest unit).
I think we can simply exclude them then take median. Nothing in ctmm part need to be changed, I only need to add a check in picking unit, and it will work on all columns. If some non-CI columns have similar case of lots of zero then a big number, which can lead to same problem in old code and the ML/high method, and this method will fix that case too.
Also I think we can make the bar a little bit higher for non-zero. .Machine$double.eps
is 2.220446e-16
in my machine, but I think any value smaller than like 1e-9
can be treat like zero already in considering unit. There is not much difference between 3e-9 microsecond
and 3e-12 seconds
, and the latter is easier to compare and reasoning with SI unit.
This is much easier to implement. I implemented it and it looks good.
One minor question, is hm2
a commonly used unit and well known to regular users? For myself I need to search to know what it is, and even after that it's hard for me to have a feel how large is 1hm2
. Though maybe it's common for animal tracking people?
Hectares are reasonably common, though people might not recognize that a hectare is a square hectometer.
I'm running automated tests with all ctmm internal datasets.
@chfleming , with sampled wolf data
There was some warning about duplicated row number, I'm looking at it now.
In speed outlier page it seemed the default estimate function met problem, the app fell back to alternative definition.
model selection page
home range plot, 2 plots have the contour messed up
estimating speed took quite some time