ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
30 stars 21 forks source link

Support for telemetry errors #49

Closed xhdong-umd closed 5 years ago

xhdong-umd commented 6 years ago

See discussion in #5 #28

xhdong-umd commented 6 years ago

If we keep every fit (include the one before and after the manual adjustment of sliders), there could be at most 5 fits to show in the variogram:

We can add checkboxes to turn them on and off, though 5 curves in one plot might be a little bit too overwhelming?

The only simplification I can think of now is to only keep current version instead of both before/after version, so any manual adjustment will just update the value, without showing before/after versions.

xhdong-umd commented 6 years ago

Maybe a better approach is to arrange tabs base on the model fit action:

Each page turn trigger a model fit action, and we only show at most 2 curves in each page. User can still compare every curve by switching pages, just not in same plot.

One limit to this is that model fit is running for all animals parallel, and any individual change will trigger fit for all animals. It's possible to only fit changed animals individually but I need some extra code to handle this.

xhdong-umd commented 6 years ago

Previously we have the model summary table interact with the variogram plot: selecting rows in model summary table will plot the variogram and model for them. It's useful because each individual can have multiple models, this enabled comparison of multiple models on same individual. The home range/overlap/occurrence pages also respond to selected models here.

Now with 2 pass model fitting this become quite complex.

One approach is to keep each clickable table in each tab, and show a combined table in a separate box, but that combine table doesn't have the interaction with variograms. This design is clean in concept, but there will be some duplications in content.

Another approach is to include 1st pass model result in the model summary table of 2nd pass, but ignore the clicking on 1st pass models. This design have less duplication and is intuitive on information(we show previous fit with 2nd fit to compare them), but less intuitive on the interaction(clicking on 2nd pass model change the variogram plot, but clicking on 1st pass model will not do anything). Still, this might be the best option.

xhdong-umd commented 6 years ago

This change is one of the most substantial change to existing code:

chfleming commented 6 years ago

ctmm objects have a summary() method, but I haven not coded that up to work on guesstimates, because they don't have uncertainty information in them. I can put that on the TODO list.

chfleming commented 6 years ago

I would consider an iterative process rather than a fixed number of steps. Every time you re-fit, you are always going to compare 2 models and reject the one with higher AICc value.

xhdong-umd commented 6 years ago

For guesstimate, I can just print the before/after slider values in console, which will also be included in work report. This might be intuitive and can directly map to the slider adjustments.

For iteration of model fit, I'm thinking using same page to refit, and try to make further iterations.

xhdong-umd commented 6 years ago

This is the plan I have for now:

In theory we can show every curve of before/fine-tuned/fitted, but that could be overwhelming. I'll make the app to support them, then we can discuss the default or options to show/hide them.

xhdong-umd commented 6 years ago

@chfleming In the code of variogram fine-tune backend, we used STUFF$storer to convert slider values into a CTMM object.

I noticed that for buffalo data the guesstimate CTMM object have error slot as logical false, while the error slider is 0-100 m, and the converted CTMM object from slider values have error slot value at integer 0.If user dragged slider, the error value will update to the slider value, that's correct.

I'm wondering if user just open the slider page, apply changes without actually dragging sliders, that will save the CTMM object from slider values (with error slot as 0) as the updated guesstimate value, while in fact it should be identical from original value, which has error slot as false. Will this cause any problem?

chfleming commented 6 years ago

Some data have pre-calibrated errors, in which case error is treated as TRUE or FALSE, while some data have unknown UERE, in which case error is treated as a positive numeric and considered as FALSE if 0. I coded the slider settings to reflect this difference. error=0 and error=FALSE should work the same way in either case.

xhdong-umd commented 6 years ago

I combined the guesstimate page with the empirical page since the guesstimate value don't need any user interaction, and the guesstimate curve should not interfere with the reading of variogram.

So the guesstimate will be shown automatically

screen shot 2018-06-28 at 1 57 38 pm

In the pop up page, the original guesstimate curve will be kept for reference, and the modified curve will be drawn with a brighter color

screen shot 2018-06-28 at 1 58 00 pm

After the modification applied, the two curves are also shown in the group plot.

screen shot 2018-06-28 at 1 58 08 pm

There is a minor point that not ideal, i.e. the plot is actually shown two identical curves with different color when no modification is made, so the color of that curve is a mix of two colors, which is slightly different from the color of each curve. I tried to only draw one curve before modification, which worked in group plot but failed in the pop up page. Because I need to check if the two curve (CTMM objects) are identical, and in the pop up page one of the CTMM object is created from slider values, which introduced some very small difference in float number and make two objects not identical.

The color mix of two colors also limited the color choices. If the mixed color was too different from each color, the plot can be confusing. The current color is not the best looking color, but there is less surprises in all cases (single curve, two curves).

xhdong-umd commented 5 years ago

@chfleming @jmcalabrese The refit feature could be quite complicated if we want to refit models for multiple iterations.

xhdong-umd commented 5 years ago

I think maybe this is a better approach to show model iterations:

i.e. the model name will have

  1. a model number, which identify each model, increase by time
  2. animal name
  3. model type
  4. :r postfix if it was refit from a previous model.

Update: this has a problem. The notation recorded the init model, but the new fitted model could be of different type, thus the init model + postfix is not suitable.

One approach is to add a column in model summary table to show what's its init model. User need to check the table by model_no to find out which it is, since there could be multiple models with same animal + model type name.

chfleming commented 5 years ago

I don't know about internal structure, but if you have two models of the same type, then the one with lower likelihood should be immediately rejected.

xhdong-umd commented 5 years ago

I'm sorting all models by dAICc column, and always select the best one by default. The model result is still in the table as we want to keep the result. We can discuss the details in the meeting.

xhdong-umd commented 5 years ago

@chfleming How should we compare ctmm models and determine whether they are similar enough and safe to ignore as duplicates?

I assume a very close dAICc value is not enough, right?

chfleming commented 5 years ago

I think after running "refit" the duplicates should be left for the user to see no improvement in fit. There can be a button to manually discard all model fits of the same type with higher AICc value.

Before proceeding to the next stage, models of the same type but with higher AICc value should be discarded automatically (if not manually) in my opinion.

For models of the same type, equal AICc means that they are duplicates, but higher AICc values means that they are optimization failures. Both need to be discarded, ultimately.

xhdong-umd commented 5 years ago

In the meeting we discussed that simply discarding bad results may remove the model of different type, which could be useful even with a higher AICc value. So I was thinking only discarding similar models.

Now we are limiting the comparison to be within each model type, then each type of models are kept, and only the best were kept. This looks to be a good approach.

xhdong-umd commented 5 years ago

I have implemented the mode page fine-tune page, and refit with fine-tuned result. The checkboxes in first page and model page also can show/hide curves.

For fitting with error, this is my plan

chfleming commented 5 years ago

Fitting automatically before having the option to turn on error with a checkbox can be a bad idea. Some data have repeated times and only fit correctly with an error model... where the duplicate times can add information to the error model.

Don't worry about the isotropic checkbox. I am now automating that behind the scenes in ctmm.select(), so that it starts with simpler models and works its way up to more complicated models to help ensure optimizer convergence.

Also don't worry about tying the error checkbox to refitting, but there does need to be an error checkbox. I am automating as much of the multi-step fitting as possible in ctmm.select right now. Users just need an option to fiddle with the variogram and refit (just in case of optimization failure), pretty much like you have worked out.

Error sliders will be 0-1 logical if error is logical (data are calibrated), but numeric and in meters if error is unknown (data are not calibrated). The ctmm function should give you the proper range and step size. When it is 0-1 with step size 1, I code for that as a checkbox instead of a slider in ctmm's manipulate implementation.

xhdong-umd commented 5 years ago

OK, the model page is designed to fit automatically when switched, this is difficult to change, but we can put the "turn on error" checkbox in the variogram control box above, so user can turn it on before switching to model page.

I noticed there were some warning message about fitting with error models in importing data with duplicated time. Similarly I can check the time duplicates and give a notification to suggest user turn on error in this page.

I implemented the error sliders according to ctmm backend function, so when it's logical the slider will be a 0-1 with step size 1. I think it's more consistent and easier to click than a check box, but I can also change it to a check box if you think that will be better.

For error checkbox with refit, what should I do if the checkbox is checked?

xhdong-umd commented 5 years ago

Previously we have the app to capture error/warning messages in app by default. However I found if something crashed the app, the error message will be lost since it was redirected to app instead of console. This will make debugging and error report more difficult. I decided to make the default to be show error message in R console. The warning in import still will generate notification in app, and the web hosted app will have error captured in app by default.

chfleming commented 5 years ago

For error checkbox with refit, what should I do if the checkbox is checked?

I'm not exactly sure what you mean. On the initial pre-fitting page, there needs to be an error check box to turn on errors for all individuals conveniently. In ctmm.guess(), you can do that with CTMM=ctmm(error=TRUE) and I think that will supply reasonable defaults.

xhdong-umd commented 5 years ago

I implemented this feature and it seemed to be working. Though I noticed that with sampled data the variogram plot have the curves as separated circles if error is on, is this normal? The full data don't have this problem.

screen shot 2018-07-26 at 2 29 50 pm
chfleming commented 5 years ago

That's right. You should be able to remove the error checkbox from that dialogue. Although, leaving it there doesn't hurt anything.

As for the varioram, when error is turned on the model variogram is now calculated with error, but errors are only defined at sampled times, so the model plot becomes discrete. Without the error, the model is defined at all times and is continuous.

xhdong-umd commented 5 years ago

Do you mean to remove the error slider from the fine-tune/pop up dialog? I think previously we want to use that to turn on error for individual animal. Is that feature still needed?

chfleming commented 5 years ago

That feature shouldn't be needed with the back-end changes that I am making now.

xhdong-umd commented 5 years ago

I tried to remove the slider, but met with error in getting current slider values with STUFF$storer. I will just leave it there since it doesn't hurt anything.

chfleming commented 5 years ago

I think I was confused about what you were asking.

xhdong-umd commented 5 years ago

So there are two error checkbox/sliders:

  1. in tab 1, shown in screen shot above, an error checkbox that apply to all animals
  2. in fine-tune pop up dialog, an error slider with value of 0 - 1, step 1. (it's a checkbox in ctmm manipulate code)

I planned to remove the error slider in 2 if the data is calibrated according previous discussions, but that caused a mystical error. I decided to just leave the error slider there if that doesn't hurt anything.

Did you mean the error checkbox in 1 is not needed, instead of 2?

chfleming commented 5 years ago

Those are both good as they are. Sorry.

xhdong-umd commented 5 years ago

I implemented the feature to remove the non-optimal models for each model type and every animal. I'm not totally satisfied with the button name, but didn't think of a better name yet.

screen shot 2018-07-31 at 2 43 22 pm

The UI is a little bit cluttered now. Previously I added the button select best and clear selection even the app will select best models (that's best model per animal, no matter what model type), because I thought user might explore some other models then want to restore the best model selection with one click.

Now we have the button to clean up suboptimal models for each model type, I think maybe we can remove the two buttons of select best (app always select best after update anyway) and clear selection (which itself doesn't make sense since without models select no variogram will be shown, the only use of it is that it's easier for user to clean up selection and select some models manually). These buttons may confuse users and I think they probably are not needed anymore.

chfleming commented 5 years ago

"Remove misfits" ?

xhdong-umd commented 5 years ago

It's certainly a shorter name which is good for UI, though I'm not sure if it sound a little bit negative.

It's hard to take a good name sometimes. Maybe leave it to the next meeting.

xhdong-umd commented 5 years ago

Most items in my list are implemented now. I also updated various help text in app. I updated the master branch and the hosted web app. @jmcalabrese If you want to have a look at the web app, now the web app is in same version with the development branch.

One task I'm thinking is if the caching of model fit can apply to individual animals (right now cache only works when the whole group is exactly same with before), but that could be tricky.

xhdong-umd commented 5 years ago

I rearranged the buttons in model page to this:

screen shot 2018-08-10 at 11 19 25 am

The button keep best only is not as clear as remove suboptimal models, but this way the button is more aligned. I moved the selection buttons to bottom because the model table is smaller now and the buttons will be still visible in one page.

Two negative points about the new layout:

xhdong-umd commented 5 years ago

@jmcalabrese I fixed the home range table header alignment issue. Some configuration parameters doesn't play well together.

The table used to have numbers align to left and string align to right, now I make them align to center in data summary table in visualization page, model summary table and home range table. I didn't set them for every table in app because other tables are much more simpler, which doesn't seem to cause a problem.

jmcalabrese commented 5 years ago

@xhdong-umd ok, thanks, will check out the table fix.

jmcalabrese commented 5 years ago

After exploring the new error calibration/modeling functionality in the webapp, I think there needs to be more guidance for the user on the steps involved. In the example I worked through (tapirs), the app pops up a message to check the console for warnings. Buried in the console output is a message that the data cannot be fit without an error model because of duplicate timestamps. Then it is up to the user to figure out how to calibrate the error and then incorporate that in the model fitting/selection step. I think that's all too cryptic.

One possibility would be to automatically pop up a message explaining: 1) the (potential) need for modeling error, and 2) the basic workflow required to do that. Any warning like that mentioned above would automatically trigger this help message. We may also want to trigger a message like this any time there is any kind of error info in the uploaded data (e.g., an HDOP column or similar).

Going further, any time error related warnings arise or error info in the data is detected, it might also be a good idea to automatically put the focus on the "Error" tab of the plot window on the visualization page. That way, the user gets the help message explaining the error modeling workflow and then immediately sees the relevant part of the app to start that workflow.

xhdong-umd commented 5 years ago

The warning message came from ctmm, and the app will ask user to check messages when there is warning/error in importing data, as this usually need user's attention.

I had the idea of detecting this message and pop up message, but didn't implement it because I thought the calibration step need calibration data anyway, so asking user to calibrate data when there is no calibration data available doesn't help.

I'll try to detect this message and pop up a message, maybe ask user to check error tab help, also focus to error tab.

xhdong-umd commented 5 years ago

There seemed to be several possible warning messages relate to error models, like duplicate timestamps and HDOP columns etc.

@chfleming Can we have some consistent keywords for them, so I can detect related messages with one pattern? For example if all of these have "error models are needed" (this is just an example) line, it will be easier to detect.

chfleming commented 5 years ago

For the moment I'm in the middle of coding for location/fix class support---different UERE values in the same dataset---so I will be a little delayed getting to this stuff.

This issue is more tricky than just the warning here. if any(dt==0) then you absolutely need errors turned on. But, moreover, as any dt approaches zero, then the results become more and more divergent and an error model becomes more and more necessary. This depends on how movement SVF(dt) compares to the error variance. So under assumptions of GPS quality data, maybe you could give a rough quality indication around the variogram stage, but not at the importing stage... at least not in an exhaustive way.

xhdong-umd commented 5 years ago

Did uere result become a list? Previously I'm using uere(object)["horizontal"], now it seemed that I need to use uere(object)[["horizontal"]] to get the uere column in the data summary table.

jmcalabrese commented 5 years ago

@chfleming @xhdong-umd: I think we should discuss the error modelling workflow and how to guide users through it during our next meeting.

I'm not sure the evaluation of the necessity of the error modelling needs to be comprehensive at the import stage, but obvious cases should be flagged in a clearer way, and the possibility of need to model error might be mentioned then, particularly if the data contain error info. This could be refined at the variogram stage.

Alternatively, we may need to rethink the "Model selection" page to incorporate the error functionality there (including calibration), after a more comprehensive evaluation of the necessity of modeling error can be made.

chfleming commented 5 years ago

@xhdong-umd I forgot to message you about the UERE updates. I was up late last night finishing everything. uere()<- now supports categorical errors associated with location classes, so I had to restructure the UERE format.

Whether or not the data are calibrated is now stored in the UERE slot of the telemetry object. Specifically, you want

data@UERE$horizontal

But there can be multiple location classes, so

all(data@UERE$horizontal)

will tell you if all location classes are calibrated. Alternatively,

mean(as.logical(data@UERE$horizontal))

will tell you the fraction of location classes that are calibrated.

chfleming commented 5 years ago

@xhdong-umd I've made more changes to telemetry and UERE object structure. There is now a convenience function for you (not exported) is.calibrated so that you don't need to keep track of these changes. It returns the fraction of the location classes that are calibrated.

xhdong-umd commented 5 years ago

With the most recent master branch I found these errors:

> library(ctmm)
> data("buffalo")
> uere(buffalo)
Error in `$<-.data.frame`(S3Part(x, TRUE), name, value) : 
  replacement has 1 row, data has 0
> leroy <- as.telemetry("/Users/xhdong/Projects/ctmm-Shiny/data/misc tele/leroy.csv.gz")
VDOP not found. HDOP used as an approximate VDOP.
Minimum sampling interval of 13.3 minutes in Leroy
> uere(leroy)
An object of class "UERE"
    horizontal vertical   speed
all   153.2381 1.476926 1.34559
Slot "DOF":
    horizontal vertical speed
all        918      918   919

> ctmm:::is.calibrated(leroy)
Error in UERE[CLASS, type] : subscript out of bounds
chfleming commented 5 years ago

Thanks. Found the bugs and running check now.

chfleming commented 5 years ago

Fixes are up.

xhdong-umd commented 5 years ago

The errors are fixed now.

It's not obvious from the code, does is.calibrated always return 0 or 1, not some specific value? So I just treat it as true/false? (it will return NaN for a list of telemetry objects, though that is not a problem for my use)

I plan to have two columns in the data summary table: