ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
32 stars 21 forks source link

high level design of model page, Variograms #17

Closed xhdong-umd closed 6 years ago

xhdong-umd commented 7 years ago

We're still finishing some small details on speed definition of outlier detection, but the app itself is working well now. We can update the speed definition anytime without much change on app.

Now I think it's time to start discuss on the model page. There could be at least some new points need to be considered on top of previous design:

There are lots of detailed issues recorded in the project page, though the above 3 questions are more important.

jmcalabrese commented 7 years ago

@xhdong-umd: I like the layout and functionality of the variogram page. I know @chfleming prefers zoom by lag to zoom by fraction, but I still find zoom by fraction superior for zooming in on small lag behavior of all plots simultaneously, and think we should keep both options. Switching between them with a radio button and using the same slider for both is a really clean and elegant solution. Very nice. The only usability issues that jumped out at me are that:

1) The individual plots are too compressed along the y-axis, and the aspect ratios are too extreme. I would suggest forcing a larger and more conventional aspect ratio, like golden ratio or similar.

2) The axis labels are too small on both axes, making the units hard to read.

xhdong-umd commented 7 years ago

I think keep them both have advantages, as the "relative zoom" can reveal more details for each plot. We just need to remind user that each figure have its own scale so it was meant to inspect plots individually, not for comparison.

There is a figure height input to control the height of individual plot. I guessed the current value is a little bit low, you can adjust the number and find a good range for you, then I can make it as default. You can also change the column number.

To control the aspect ratio directly would need to change the ylim, which also need to include the confidence interval (the gray area). If a bigger default figure height can make it work, I'd incline to not to change the ylim manually.

As for the axis labels, it looks good in individual plot but got compressed when we stack many plots in one panel. I'm adjusting some parameters for this now. The base plot need a lot of trial and error to find the right parameter combinations.

xhdong-umd commented 7 years ago

I updated the repo, with default figure height 250 pixels, and increased axes label size. Setting these parameters manually may have different effect in different platforms though.

I also noticed the plot axis label become more detailed when the space increases.

screen shot 2017-05-03 at 10 32 05 am
xhdong-umd commented 7 years ago

@chfleming With buffalo data, the tau velocity and error doesn't have effect on the fit, maybe it's because of the model? Should we disable the sliders if the model doesn't take the parameters?

chfleming commented 7 years ago

I have a bug in my code at the moment that the error parameter is not rendering without a fitted model (which includes uncertainty on the error). I'm fixing that right now.

tau velocity should be working, but you have to zoom way in to see it.

xhdong-umd commented 7 years ago

OK, I saw the effect of tau velocity.

I've been thinking if we should make an interface with variogram.fit so it can be used in both manipulate and Shiny. Right now I copied the code in variogram.fit and made some changes for Shiny, then any update in variogram.fit need a revisit of these copied and changed code.

Currently I have the code already working, just need to put into app. So we don't need to do this now, or if there is not going to have too many changes, we can just use current version.

However one concern is that I'm not 100% sure if my version worked in all cases since I don't totally understand them, and I don't know how to generate different test cases for all kinds of models to test it.

To make an interface will need some substantial changes in structure:

This way the Shiny app will always have same behavior with the manipulate version no matter what changes we will have.

chfleming commented 7 years ago

The bug I fixed is in a different function, but that's a good idea.

variogram.fit.plot will have to take parameter arguments and init_sliders will have to be used to set parameter arguments. It looks like manipulate will accept variable name strings like "x"=slider(...) just as well as x=slider(...). So we will also need a character array of parameter names. Will these 3 ingredients also be sufficient for Shiny? If so, I can try to make an abstraction layer within a couple of days.

xhdong-umd commented 7 years ago

variogram.fit.plot can take a list of some named values in SI units: zoom, sigma, tau1, tau2, circle, error, and parameters CTMM, variogram. Inside the function it update CTMM with the values, draw the plot.

variogram.fit can convert the units before calling variogram.fit.plot, since it also called init_sliders already it has access to the units information.

In my Shiny code I think I just wrote the slider id name manually. I have something like this:

# init sliders
res <- list()
...
res$slider1 <- list(label = "zoom", 
                               min = 1+log(min.step,b), max = 1,
                               value = 1+log(fraction,b), step = 0.01)
...
res$slider2 <- list(label = label_2, min = 0, max = m*sigma,
                               value = sigma, step = 0.01)
...
# return other variables that could be of use
res <- c(res, list(variogram = variogram, CTMM = CTMM, 
                       b = b, sigma.unit = sigma.unit,
                       tau1.unit = tau1.unit, tau2.unit = tau2.unit))
return(res)

Then you can build manipulate sliders

build_slider <- function(para_list) {
    return(manipulate::slider(para_list$min, para_list$max, initial=para_list$value,
label=para_list$label))
}

manlist <- list(zoom = build_slider(res$slider1),
                sigma = build_slider(res$slider2),
                        ...)
xhdong-umd commented 7 years ago

And I found this discussion about satisfying R CRAN Check.

The two methods of put

in ./R/zzz.R are not perfect, but might be an alternative. Of course put all variables declarations in other places is not ideal, but at least the function itself is clean, and hopefully you don't have many variables need to be declared.

xhdong-umd commented 7 years ago

I updated the repo with the fine-tune fit feature.

After you check the box of "guesstimate model" (do you have a better name for this?), choose the individual you would like to fine-tune in the drop down box. There will be a pop up for the manual fit.

The pop up is using 0.5 fraction, while the default page of absolute zoom is using a different fraction value(the slider is in 0.5 but that is the 50% of max x range of all figures), so the plot may look different. Switching to relative zoom will have a more comparable view.

You can edit in the drop down box directly. The pop up only happen when there is a change in drop down box. For example you chose Cilla, made some changes then closed the pop up window, then if you want to fine-tune Cilla again, you need to delete current input of Cilla then choose Cilla to activate the window.

@chfleming The option of double range automatically when slider was dragged to the end may surprise users, so I will add a button to double the slider ranges. The zoom slider doesn't need adjustment, right?

screen shot 2017-05-03 at 2 40 50 pm
xhdong-umd commented 7 years ago

I updated repo with a slider to extend the slider ranges. You can increase the max limit of all the sliders except "zoom" to 0.5 ~ 10 times. If 10 is still not enough in some cases, we can chose another value.

xhdong-umd commented 7 years ago

I think the next steps are model selection and maybe turn on the ERROR for some data.

The model selection part may need some careful thinkings. I think I will start to look at the help and documentation of app because now seemed to be a good milestone point. Later if the interface in variogram.fit is finished I will also update the app to use the interface functions.

jmcalabrese commented 7 years ago

I like the updates to the variogram page. I also think having a "increase slider range" slider is a more intuitive solution than automagically doubling the range when the user hits a limit (some users may never discover that feature).

A couple of usability observations:

1) Could we have log-scaled sliders that display the original values instead of logged values? So for fraction, instead of seeing a scale that goes from -3 to 0 with even spacing between steps, the user would see a scale that goes from 0.001 to 1 with log spacing between steps. I think that would be more intuitive for many users. Asking users to take antilogs in their head while they're working with various sliders is probably asking too much.

2) The differences between variograms with absolute zoom and scaling and variograms with relative zoom and scaling, both in terms of plot behavior and why the user would want each of them, are very similar to the differences between the facet and individual plots on the visualization page. With variograms, we are using radio buttons to switch between representations, but on the visualization page, we use different tabs. That seems inconsistent to me, and I wonder if users will find it confusing? I'm not sure which is the better solution at this point, but it seems like these kind of representation shifts should be kept consistent across the different pages of the webapp.

jmcalabrese commented 7 years ago

@xhdong-umd: I agree that focusing on help, documentation, clean-up and testing should be the priorities from now until the 15th. I would rather have the things that are already there working well and with documentation than trying to rush development of another layer of functionality.

NoonanM commented 7 years ago

@xhdong-umd @chfleming It regularly happens that some individuals in a dataset aren't range resident, and need to be excluded from the rest of the analyses. At the variogram stage, it would be useful (if possible) to have a tick box (or similar) where users can move through their data and select which animals are range resident (or vice versa, which animals aren't). Then, moving forward to the next stage of analysis, only individuals selected as range resident are carried forward.

xhdong-umd commented 7 years ago

I have some code that can convert the slider labels into log scale, so the underlying value is still the same, but the label and displayed values are in log scale. However it is in log base 10, and the current code in variogram.fit is in log base 4.

@chfleming Do you want to keep it in base 4 or is it OK to change to base 10? I do notice that 0.5 is still 0.5 in log base 4, not sure if this is related.

As for the plots of 3.facet and 4.Individual, they are of similar concept. Though the facet plot keep all x axis aligned for easier comparison, thus in a single column layout. The individual plots are in 2 column layout to maximize the space usage, and it have two sliders which will be useless in facet plot.

There is no tech difficulties to put two plots in same page with a radio button, but that will update and redraw the plot with every switch, compare to tabs which will have plot cached without redraw. The whole layout also changed dramatically in switch. In the variogram case the layout is almost identical, and the redraw is relative light (with bigger dataset the visualization plots could take a while to render).

So the concepts of plot design are quite similar, but from the UI perspective I think the different arrangements are better suited for their cases. We can convey the information of plot intentions and usage in the help document and video demo:

After the summary, I think the two cases are similar but with some differences. I tried to think some better names or labels to convey these differences, but found difficult to summary in a few words.

xhdong-umd commented 7 years ago

@NoonanM I need to think about that in the model selection context. It may involves selecting from multiple models for multiple individuals, so the workflow could be difficult to implement.

This will be the task in July, but my tentative idea is that the model fitting could take some time, so to deal with multiple individuals and multiple models, maybe we can have a "model fit planning" step, which is to create a recipe of fit which model with what parameters for each individual, then run the model fitting processes in batch.

chfleming commented 7 years ago

Instead of a slider to adjust the slider max, can we just have a button that sets all of the slider maxima to twice the current estimate, effectively centering all of the sliders. That would make the sliders easier to tweak, I think.

xhdong-umd commented 7 years ago

@chfleming At first I made a button, then I was afraid that doubling one time is not enough, so I wanted be able to continue expanding if needed. Because the first double took the initial slider limit, next double need to save previous states in some global variable. I don't like the global variable idea so changed it to slider instead.

By double the current estimate, you mean just use current slider value * 2 no matter what it is? So user can move to max, click button to double. Then he can move to max again, click button to double again.

In the other hand it can also make the slider range smaller if the slider is on the left end. This could also be helpful to increase the slider resolution.

chfleming commented 7 years ago

This part with ctmm.guess (which includes within it variogram.fit and the sliders) is just for obtaining an initial guess. It is not really a fit or selected model and variogram.fit is a misnomer. It should be called variogram.eyeball.

The next step is to feed the data and initial guess into ctmm.select with verbose=TRUE, report the results, and take the best (first) model. Overfitting is not yet an issue with these starter models because they don't have anything interesting happening in the trend/mean term to overfit with.

The report should include both the summarized list of selected models and the plot of the best fit model atop the empirical variogram.

chfleming commented 7 years ago

The base I used is arbitrary, you can use whatever you like.

chfleming commented 7 years ago

Yes, I mean to set the slider max to equal twice the current slider value. This will help both (1) situations where you reach the slider max and want to go further and (2) situations where you are near the slider min and are having trouble with the resolution.

xhdong-umd commented 7 years ago

I have updated the repo with the zoom slider in log scale, and added button to center slider values.

After you finish the interface on variogram.fit I'll update my code to make sure the app have same behavior with the manipulate version.

@chfleming For centering slider, do we need to update the first slider "zoom"? And the error slider started at 0 so the max may become zero after update. Should we just ignore the error slider?

Then the button only have effect on 3 of 5 sliders. I'm trying to find a better label to explain this behavior.

jmcalabrese commented 7 years ago

@xhdong-umd:

  1. Your justification for using a different UI for facet plots and multipanel variograms makes sense, and in light of that, I agree that we should keep those the way they are.

  2. I just checked, and the variogram zoom slider still shows a log scale to the user, ranging from -3 to 0. Did you not have a chance to change the labels so that the user sees the original units on the slider, or was my comment unclear?

xhdong-umd commented 7 years ago

Interesting, I think I did push the changes to the repo.

I just updated the repo again, and directly running from the repo showed this scale for me:

screen shot 2017-05-09 at 9 26 35 am

Can you test again?

jmcalabrese commented 7 years ago

Ok, just tested again.

On the main Visual diagnostics page (before selecting an individual to fine tune parameter guesses), I still see a slider range of -3 to 0.

After selecting Cilla (with no outlier filtering or time subsetting), I see a zoom slider range of -1.547 to 1 on the slider palette. That seems like a mix of log scaling on the low end, and linear scaling on the high end. Not sure what's going on there, but that's not what I had in mind.

xhdong-umd commented 7 years ago

Shiny doesn't support log slider directly, so I used some javascript code to modify the slider labels. This could be some bug caused by different platform/browser interpreting that javascript code.

What platform/browser are you using?

jmcalabrese commented 7 years ago

I am on linux right now. I had been using Rstudio's built in browser window, and there the sliders are still as I described above. When opened in Firefox or Chromium, the slider range on the main Visual diagnostics page looks good: 0.001 to 1. The palette zoom slider then has range you showed above, but it seems like you multiplied the range by 10? Shouldn't it stop at 1 for maximum zoom?

xhdong-umd commented 7 years ago

The slider range in fine-tune window is same with the manipulate version of variogram.fit, which calculated the range max as 1, then converted the slider value with fraction <- b^(z-1), here b is the log base, z is the slider value, so this slider is not fraction, but a function of fraction. I want to keep the app behavior consistent with the manipulate version, so I used the same arrangement.

Now I just realized maybe manipulate doesn't support a log scale slider, and that's reason of that arrangement. So we may have to make the difference here, using a log slider with another name, maybe just fraction instead of zoom.

xhdong-umd commented 7 years ago

@chfleming For our interface for variogram.fit, variogram.fit.plot should just take value of fraction, and it's the UI code's responsibility to convert between scales. So the manipulate version will generate the proper fraction value from the slider zoom, and the Shiny version will generate the proper fraction value from the log slider fraction.

I just updated the repo with slider renamed to fraction, and use it as fraction value directly so the max is 1. Now the behavior is different from manipulate. After the interface finished I'll update the code again.

xhdong-umd commented 7 years ago

I'll look at the linux RStudio browser and see if the slider problem can be fixed later.

xhdong-umd commented 7 years ago

I decided to stop on linux VM since it already took me one hour and I have not get R installed. I'll work on other issues first and check this later if have time. Before fixing the bug we will have to ask user to use firefox or chrome.

vestlink commented 7 years ago

maybe a warning when fitting the model saying something like "The model fitting may take some time..."

xhdong-umd commented 7 years ago

@vestlink There are notifications for all time consuming tasks. I also add a button initialize the model fitting explicitly, so it will not start after user switched page accidentally.

I planned to give user a summary of work plan before start fitting the models, and warn user about the plan that could take a long time to finish, like batch fitting many models.

vestlink commented 7 years ago

The "remove outlier" button seems a bit mis-placed when thinking about work flow.

I would consider moving it below the list ouf outliers listed considering that you would have to move back and forth too many times.

xhdong-umd commented 7 years ago

Do you mean moving the button from top of the table to bottom of the table like this?

screen shot 2017-05-10 at 8 45 49 am

I put it on top because there are also a reminder for user about selecting rows, so it's natural to select rows, then remove them. If put the button to bottom, it will add one extra row to the box height. I also thought user need to inspect in the scatter plot before removal, so the screen focus is more likely to be on the upper part than the lower part.

vestlink commented 7 years ago

I see your point. Add long as it is described I the work book to guide the users or should be no problem

On 10 May 2017 14:52, "xianghui dong" notifications@github.com wrote:

Do you mean moving the button from top of the table to bottom of the table like this?

[image: screen shot 2017-05-10 at 8 45 49 am] https://cloud.githubusercontent.com/assets/25039897/25899356/8defb434-355d-11e7-88bf-cb6d08d60fd7.png

I put it on top because there are also a reminder for user about selecting rows, so it's natural to select rows, then remove them. If put the button to bottom, it will add one extra row to the box height. I also thought user need to inspect in the scatter plot before removal, so the screen focus is more likely to be on the upper part than the lower part.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ctmm-initiative/ctmm-webapp/issues/17#issuecomment-300472377, or mute the thread https://github.com/notifications/unsubscribe-auth/AT0_HFVz3BAkCz-pxEMZLjOx18i__PQaks5r4bMZgaJpZM4NDe2D .

xhdong-umd commented 7 years ago

I changed the title to focus on variograms. The variograms feature is complete, so I'm closing this issue now.

When I need to work on models again, I'll create an new issue to continue the discussion on work flow.

xhdong-umd commented 7 years ago

@chfleming , I'm adding the switch of ERROR for vairogram. I didn't find any visual difference with ERROR on or off, is this normal? Is there some data can show difference so I can verify it works? I also update the app in repo so you can test with your data.

DATA <- as.telemetry(system.file("extdata","leroy.csv.gz",package="move"))
plot(variogram(DATA))
# default model guess
GUESS <- ctmm.guess(DATA,interactive=FALSE)
plot(variogram(DATA), CTMM = GUESS)
GUESS$error <- TRUE
plot(variogram(DATA), CTMM = GUESS)

Another question, will turning on ERROR change the error slider in the fine tune UI variogram.fit?

I found there is a error term in the info slot of guess obj. Is this supposed to be the error slider in variogram.fit? Both my code and current variogram.fit doesn't take this value though. I remembered that you mentioned before you fixed some bug in variogram.fit, but I'm not sure if that's about this.

DATA <- as.telemetry(system.file("extdata","leroy.csv.gz",package="move"))
plot(variogram(DATA))
# default model guess
GUESS <- ctmm.guess(DATA,interactive=FALSE)
GUESS@info$error
## [1] 465.4418
xhdong-umd commented 7 years ago

I also noticed the vignette mentioned pooling variograms. @chfleming @jmcalabrese , do you think it worth to be added to the app?

There could be a button that generate a popup window (or add another tab to the variogram box, that might be better since you can switch between two tabs easily), which pool all the individuals into one variogram. There can be a check list to select which individuals to be added/removed from the pool.

However I'm not sure what can we do with this pooled variogram, since all the other workflow is around some individuals.

chfleming commented 7 years ago

I think the first options to throw in should be the dt option, and maybe res with that, for people with very irregular data.

For pooling, users will often want to select specific subsets of individuals with similar looking behavior. Sometimes that will be all individuals.

xhdong-umd commented 7 years ago

So you mean there should be the options of dt, res, error for variogram fitting? Using same options for all individuals is relative easy, but more cumbersome if you want to use different options for different individuals.

After pooling, how do we incorporate that pooled variogram into the general workflow, which usually involved separate result for each individual?

chfleming commented 7 years ago

The dt and res arguments are useful in calculating the variogram (before fitting) when people have irregular data. This would be for the optional variogram argument of variogram.fit/ctmm.guess. For these options, I think its fair to have users group their data by sampling schedule, at least for the time being, if that makes your work easier.

Pooling is useful when individual data is sparse and individual variograms are noisy (less common than the above problem). At least procedurally, in this case you would want to use the pooled variogram for the (pooled) individuals just as you would have with their individual variogram. I suppose you just need to prevent people from re-pooling their variograms.

xhdong-umd commented 7 years ago

@chfleming @jmcalabrese I was going to move to model selection step then I saw ERROR option was mentioned before, so I thought I will just add it if it's simple. Now it seemed that dt, res and pooling will take some time to design and implement, for example

I think maybe I should first work on the next steps of model selection and home range estimation, and put these features on hold. There are only about 1 month left before the course, and we probably want the main features first.

chfleming commented 7 years ago

I would put model selection as a priority over variogram calculation options. Especially over pooling.

For dt I would use a drop-down box for units.

jmcalabrese commented 7 years ago

I agree with @chfleming about prioritizing model fitting/selection. I would prefer that the app has at least basic functionality all the way through AKDE by the animove course. We can worry about more advanced options once that's in place.

For dt, it might be useful to let the user visually select one or more values by clicking on a histogram of the sampling intervals in the data. In my experience, dt tends to be useful when that histogram is multimodal, and the user could just click on each of the modes, and the app would create a vector of the modal intervals in the background. Given the above, I would prefer to come back to this later after first completing the workflow through AKDE.

jmcalabrese commented 7 years ago

The pooled variogram only makes sense if you've got sparse data and individuals exhibit broadly similar movement behavior. If you're in that regime, then I think there are two scenarios where the pooled variogram could be useful : 1) you're interested in the variogram per se, and aren't going to go any further with the analyses, and 2) You want to get representative parameter guesses for model fitting and then impose those on all individuals that were in the pooled variogram.

One way to do it would be to let the user select which individuals to pool over, and then pooled variogram would be displayed along with the single individual variograms of any remaining individuals that were not selected to pool over. The guesstimate step would work the same, except that each individual in the pooled variogram would have the pooled variogram parameter guesses applied to it in the model fitting step.

The next question is how to activate the pooling functionality. One option would be to add a check box to the current variogram page, that when ticked opens up an interface to select individuals. Another way to do it would be to have a separate tab on the variogram page that would then have the required individual list to select from.

In any case, I think dealing with the pooled variogram can wait until after the basic workflow to AKDE is finished.

xhdong-umd commented 7 years ago

Thanks @jmcalabrese , with these detailed usage I can try to design easier to use UI and workflow.

The variogram plots are difficult to add other UI elements as they are a single item of grouped base plots. I think I can make a table of individual names with same layout of variograms, then user can select individuals and pool them together.

I'll put these in notes and work on model selection first.

xhdong-umd commented 7 years ago

@chfleming I'm working on the new variogram.fit code. The zoom slider used 1+log(fraction,b) with b=4 (probably because manipulate doesn't support log slider), but the shiny version is using log slider and base 10. To use the variogram.fit.backend, I need to convert the values for zoom slider back to log 10 first.

Can you add a parameter b=4 to variogram.fit.backend(and removeb<-4 in code)? That way I can use base 10, and your code don't need any other change.

I may have other suggestions while I'm working on it, so you don't need to change it now.

xhdong-umd commented 7 years ago

Why don't we just add a name to data frame instead of row.names? Of course there is no real difference between a name column and row.names, I'm just wondering.