Support for dt and weights

kjklauder commented 6 years ago

It would be awesome to have support for the "dt=" and "weights=" arguments, as outlined in the vignettes.

chfleming commented 6 years ago

I assume you are talking about the dt argument of variogram() and the weights argument of akde(). I think that's a good idea with the weights paper out now.

variogram dt is straightforward enough to implement, though requires some explaining. akde weights=TRUE has a lot of tuning options described in help('bandwidth') and sometimes requires picking the right options to get good performance/accuracy.

kjklauder commented 6 years ago

Yes that is what I was referring to. I see your point about the complexity of 'bandwidth' options. Would it be possible to have a toggle option for 'weights' that defaults to FALSE, but if TRUE is selected, expands to show additional tuning options? That would keep things streamlined for those uninterested in those arguments.

xhdong-umd commented 6 years ago

For dt, I think we can just add a text field for input, and a list of time units in this box. The text field can take a series numbers separated by comma. I'll also link to the vignettes Irregular Sampling Schedules section in the help.

For weights, the simplest way is just to add a checkbox in home range page, though that will apply to all selected animals. It's also possible to make a multiple choice selection box to turn it on for some animals individually.

I'm not sure if the actual usage need more detailed control, and if it's practical to implement all the options.

NoonanM commented 6 years ago

weights can be slow for some datasets, and is not always needed. I think the best option would be to be able to turn it on for some animals individually.

xhdong-umd commented 6 years ago

When you add dt for variograms, do you also need to apply it to the modeling process?

chfleming commented 6 years ago

variogram() dt is specifically for the variogram calculation. ctmm.fit() is totally robust to sampling irregularity and does not have nor need a dt argument.

xhdong-umd commented 6 years ago

When I tested with gazelle data, I met some errors because there is no timestamp columns and my code was expecting them. Is this normal? Should the app take this kind of data?

chfleming commented 6 years ago

The variogram dt example in the vignette is with a gazelle and works for me. Where were the errors?

xhdong-umd commented 6 years ago

The vignette code works fine. I wanted to test the dt feature in app with gazelle data, so I need to merge telemetry object into data.frame.

Previously all input data went through as.telemetry have long/lat, timestamp, x, y, t columns, and I rely on timestamp column to calculate the sampling start/end time. The gazelle data only has x, y, t columns, so my code met error here. I think maybe this is a special case and the app don't need to take this kind of data?

Is there any other movebank format data that can test with various dt?

xhdong-umd commented 6 years ago

This is the UI for dt (just the interface, the function is not implemented yet)

UI for weight.

It will be more intuitive if we can place checkbox around each plot, but that's difficult to implement since the plot are grouped together as a picture.

chfleming commented 6 years ago

The gazelles and wolves in the package are anonymized and lack long-lat & timestamp data. You don't need to code for these kinds of data to work.

To test variogram() dt, I think the buffalo Pepper has both 1-hour and 2-hour sampling.

Instead of calling the dt box "Irregular Sampling Schedules" I would call it "Multiple Sampling Schedules".

Should we annotate the titles of AKDEs with optimal weights?

xhdong-umd commented 6 years ago

OK, I'll change that title (it was from the vignette subtitle).

I'll also label the plot. How about something like this?

        Cilla
(Optimally Weighted)

xhdong-umd commented 6 years ago

I just realized there are two more issues with dt.

Can we detect the multiple sampling schedule and get them automatically?
The sampling schedule may be animal specific, right? So we need to apply it to individual animals?

chfleming commented 6 years ago

Conceivably, but its not trivial.
This can happen, yes. Being able to apply to all and to individuals would probably be useful.

xhdong-umd commented 6 years ago

The UI above is just the user interface, I'm still in the process of implementing the functions.

@jmcalabrese @chfleming @NoonanM The weights feature looks trivial at first, but it turned out they may need some changes in app logic - there need to be a button in home range page to trigger the calculation.

In Shiny you can update data according to user input with two approaches:

automatic update triggered by any change in dependent variables. This is implemented as reactive expression. For now most updates in app are implemented this way, for example the home range plot depend on
- modeling data
- multiple plot control options Any change in above will redraw the plot.
manual update triggered by an event, typically a button click. This is implemented as event observer and reactive value. If you want to modify a value in several different places(for example, a button click modify something, another separate event also modify the same value), this is the only method. The automatic update method above cannot be modified manually.

The home range calculation is an automatic updating reactive expression. Now the dt input can be added to the automatic expression, the only problem is that when user was adding multiple selections, every one step in the process will trigger the update, so the home range calculation will run in every step, which can take too much time.

The visualization page have same design: if you select rows in the data summary table, even if you plan to select 3 rows, the 3 clicks will trigger changes 3 times, so the first 2 changes are wasted. This is not a big problem for visualization since the update is not too slow, but it will be a problem for home range calculation. The calculation cache doesn't help either since the calculation take the whole list as input, and partial change in the list means cache cannot be used.

There is no way to avoid this in the automatic update design: the app have no way to know if a row selection is the final update user want or just in the middle of series selections, it has to update according to every change.

To avoid the extra calculation in home range with dt, we need to switch to manual update. There could be a button estimate home range, and the calculation only begin with the button click. Now user can add all the dt choices, click the button to update home range.

xhdong-umd commented 6 years ago

The change above is still doable, but I'm not sure if it worth the effort, considered that weights is complex, with lot of tuning options difficult to implement in app (the options are more suitable for command line instead of web app), and it can be slow and not always needed.

xhdong-umd commented 6 years ago

I tried to detect the multiple sampling schedule, at least give user a histogram to show which animal have multiple sampling schedule. It's indeed not easy.

Pepper have these schedules:

The histogram is difficult to read. There are too many irregular values (the data below have been rounded, otherwise it's messer), even the majority are 1, 2 hours. That is not obvious from the histogram plot because the axis is stretched too much by the big numbers.

> table(intervals)
intervals
  0   1   2   3   4   5   6   7   8  10  12  14  16  18  20  22  24  26  28  30  34  36  40  42  46  48 
  3 570 796  15  95   4  74   1  43  23  26  12   7   8   8   8   3   4   3   1   5   4   2   2   1   1 
 52  74  84 112 122 
  1   1   1   1   1

So I have to assume user know the value and skip the idea of histogram.

xhdong-umd commented 6 years ago

I noticed previously I had a plan to implement dt, res, error of variogram and pool variograms.

It's better that all variogram related new features can be considered together in a consistent way.

The error part is another important and complex topic that covered several aspects, let's put it aside for now.
Do we also need the res parameter and pool variograms?
dt itself may need quite some space already. We need to specify animals, sample schedules numbers, units, and the whole page may need different treatments for different animals.
Do we also apply these parameter to the guesstimate tab and the fine tune page (fine tune the sliders of a variogram)?

xhdong-umd commented 6 years ago

I used 1, 2 hours for Pepper variogram. In the vignette example the variogram with dt is much more smoother, but here it's not that obvious. Is the result correct?

pepper <- buffalo[[4]]
dt <- c(1,2) %#% "hour"
plot(variogram(pepper))

plot(variogram(pepper, dt = dt))

chfleming commented 6 years ago

Yeah, Pepper is not a very stark example like the gazelles. You could do some subsampling of any regularly sampled dataset to get the same effect, like

SUB <- c(1,2,3,4,5,10,11,12,13,14,15,20,21,22,23,24,25,30)

As for the other variogram options, I think dt is probably the most used.

xhdong-umd commented 6 years ago

dt is done, though I don't have a good data to test it yet. I tried subsampling but didn't have any obvious result.

Select animals, input intervals

Click Add button, it will be added to a list, and variogram updated. So you can add multiple schedules for multiple individual groups.

The variogram dt call is really simple in ctmm, but to add an UI to input flexible paramters (with any animal, value, unit combination) and plug in current workflow is not trivial.

xhdong-umd commented 6 years ago

For home range, we are using akde(telemetry_list, CTMM_model_list) to make sure they are in same grid. Can we turn on some individuals weight specifically in this?

xhdong-umd commented 6 years ago

I added a line in variogram title if dt is used so that it's easier to read the plot.

The weight feature will need akde to take a logical vector for list inputs.

chfleming commented 6 years ago

akde can take now an array of weights matching the list length of the data and model arguments.

Those dt variograms look really weird. I'm going to take a look at that next.

xhdong-umd commented 6 years ago

It's possible just because I used sampled data (100 points). I'll try full data tomorrow, and if the result is same I'll generate some reproducible code for you.

xhdong-umd commented 6 years ago

Yes, the plot above is caused by the small sample of data. This is same parameters on the normal data set

xhdong-umd commented 6 years ago

For pooled variograms, I think in last meeting you said the pool variogram should replace individual variograms instead of an additional one. For example if we pool Cilla and Gabs, we should remove Cilla, Gabs and add pool of Cilla Gabs.

I'm wondering if user may have the need to try different combinations. For example first create a pool of Cilla, Gabs, then create a pool of Gabs, Toni. And do you think it's useful to compare the individual variogram and the pooled one to see difference?

If we just add the pool as additional variogram, it'll be easy to compare with the individual ones, and it'll also easy to implement different combinations. Otherwise if a pool replaced some individuals, the individuals cannot be used for another pool unless I specifically maintain another list of original variograms. That's still doable just with more complexity, and I'm wondering if adding pool as additional has more advantages.

chfleming commented 6 years ago

I wouldn't worry about visual comparison, as most people will only pool when the individual variograms look bad.

For more flexibility, I suppose you could have one selection of the individuals for the pool copied over (by default) to a second selection of individuals whose variograms will be overwritten by the pooled variogram. That way you could deselect some of the better individuals to keep their individual variograms.

xhdong-umd commented 6 years ago

weights parameter is implemented. The plot titles are marked for easier identification, but I didn't change the home range summary table because I didn't find an easy way to add information without clutter. I suppose the plot title changes should serve the purpose.

chfleming commented 6 years ago

Would it be worth saving space by noting the weighted AKDEs with the balance emoji unicode (or Libra symbol) in the title, instead of the second line of text in the title?

xhdong-umd commented 6 years ago

The balance icon in the app is fa awesome icon. Is there a unicode symbol can be used for similar purpose?

For the Libra symbol , I'm not sure if that symbol is intuitive enough for users. I didn't recognize this symbol before searching on it.

chfleming commented 6 years ago

2696

xhdong-umd commented 6 years ago

It doesn't print correctly with this code. It seemed that only some font support this, and we need to specify font. However we can't be sure if a font will be available in user's system.

# some symbol work
plot(1, xlab='\u0298')
# but scale symbol is not printing correctly
plot(1, xlab='\u2696')

chfleming commented 6 years ago

"\U2696" on the command line and in the title() command works for me in Ubuntu.

xhdong-umd commented 6 years ago

Yes it's platform dependent.

Fonts supporting 2696

cat("\u2696") in rstudio console also works for me,

but plot title doesn't work. The console and plot system probably used different font.

chfleming commented 6 years ago

Is the Libra symbol supported on Mac?

If so, I suppose we could detect OS and then do scales on Linux/Windows and Libra on MacOS, matching consistency with the selection box.

xhdong-umd commented 6 years ago

Libra can be print in rstudio console but not in plot.

Fonts could be complex as different linux distributions actually can have different default fonts. See discussions here.

When we are saving plot into pdf and png, it can bring more problems further since they are another platform...

NoonanM commented 6 years ago

Looking around, it doesn't seem like there's a simple solution for plotting unicode symbols on mac.

xhdong-umd commented 6 years ago

When you pooled some variograms like "Cilla, Gabs", and only the pooled variogram is shown, what should we do with the ctmm.guess values? Previously it was ctmm.guess applied to every telemetry object.

I think we still need to apply ctmm.guess to individual telemetry object, then we still need to show individual variograms in guesstimate tab? So maybe we just ignore the pooled variograms in guesstimate and modeled tabs?

xhdong-umd commented 6 years ago

I found to insert pooled variogram in current workflow need too many changes. Previously the app assume each individual have one variogram in 3 modes, and selecting models need to select specified individual variograms.

dt changed variogram but didn't change the 1 on 1 mapping, so there is not much problem. pooled variogram need to add some new variogram and remove the individual variograms in pool, which break the 1 on 1 mapping of variogram to individual, caused too many problems in multiple places.

Maybe it's easier just use a separate tab for pooled variogram so they are separate from the regular work flow.

xhdong-umd commented 6 years ago

Compare to a separate tab, adding pooled variogram to individual variograms will be simpler in UI. By keeping individual variograms intact, put pool variograms into pure additional plot, the original work flow can be maintained without much change.

chfleming commented 6 years ago

The ctmm.guess value would get duplicated, but its value is needed for the individuals' ctmm.fit call, and then again to compare against the result of ctmm.fit on the individual.

What's the difficulty in simply replacing the individual's variogram with the pooled variogram? Then each individual has one variogram.

xhdong-umd commented 6 years ago

If we replace each individual's variogram with the pooled, does that mean a pooled 3 variograms will be plot 3 times?

xhdong-umd commented 6 years ago

So with pooled variogram we are supposed to also supply that pooled variogram to variogram parameter in ctmm.guess call?

ctmm.guess(data,CTMM=ctmm(),variogram=NULL,name="GUESS",interactive=TRUE)

xhdong-umd commented 6 years ago

If we just plot pooled variogram in each individual's variogram plot, that can maintain the 1 on 1 mapping and make everything simpler, even the plot is duplicated a little bit.

Oh, I realized I can keep pool as individual variogram but only plot the pool once.

chfleming commented 6 years ago

Yes, yes, and yes.

xhdong-umd commented 6 years ago

I found it's quite cumbersome to maintain correct plot titles when tab 1 removed the duplicates in pooled variogram, and other tabs keep one copy for each individual in the pool, because I need to use animal name or model name for first line, mark the dt parameter or pool variogram usage in 2nd or 3rd line.

Since tab 2 and 3 already are showing individual copy of the pooled variogram, if we use same arrangement in tab 1, it'll be much easier to implement, also be consistent in UI, probably less surprises for users.

It will also help this case: if we created a pool of animal 1, 2,3, then create a pool of animal 2, 4, then animal 2 actually have the pool variogram of (2,4) instead of (1,2,3). If we remove the duplicates, we will showing both pool variogram, but it's not clear which variogram will be used in 2. With individual copy shown, it's obvious that animal 2 is using the 2nd pool variogram.

xhdong-umd commented 6 years ago

Now the plot title can show any combination of multiple sampling schedule/pool variogram in 3 tabs. The 1st tab also show each individual variogram for pool variogram, same with other tabs.

I think the home range plot titles don't need to carry all these additional variogram information, right? They are based on the modeling result, and only need to show the models (and optimal weighting if applied).

xhdong-umd commented 6 years ago

ctmm.guess is taking the updated variogram (multiple schedule, or pooled) as parameter now. The dt, weights, pool variogram features are finished.

xhdong-umd commented 6 years ago

@jmcalabrese @chfleming @NoonanM I'm adding discussions about multiple sampling schedule here since this is the original thread. In my early exploration fo this feature I tried histogram and frequency count, but didn't realize kmeans can be used for this.

I'm planning to add a histogram of sampling schedules, also calculate kmeans with default k=1.

At first I was thinking use k=2, but it will return 2 values even for obvious regular data which may confuse users. Maybe it's better just use k=1 as default, only increase k after verified from histogram.
For the histogram of sampling schedules, the axis labels need to be improved:

The default value is number in seconds, which is not very intuitive in reading.

We can transform them to best units suit for its scale (minutes, hours etc), however the transformation here happened after the breaks in plot is determined, so the transformed values are not integers, which looks a little bit weird.

Alternatively, this transformed the time into a specific time starting from 00:00:00, which should be easier to read, but in concept the value actually changed from a time duration into a specific time in one day. I'm not sure this is a good idea.

To make things more complicated, some data have very small sampling time (much less than seconds, for some cell data), I'm not sure what these will look like with that kind of data. @chfleming Do you have some data in microseconds sampling interval?

ctmm-initiative / ctmmweb

Support for dt and weights #54