Closed kjklauder closed 6 years ago
I assume you are talking about the dt
argument of variogram()
and the weights
argument of akde()
. I think that's a good idea with the weights paper out now.
variogram
dt
is straightforward enough to implement, though requires some explaining. akde
weights=TRUE
has a lot of tuning options described in help('bandwidth')
and sometimes requires picking the right options to get good performance/accuracy.
Yes that is what I was referring to. I see your point about the complexity of 'bandwidth' options. Would it be possible to have a toggle option for 'weights' that defaults to FALSE, but if TRUE is selected, expands to show additional tuning options? That would keep things streamlined for those uninterested in those arguments.
dt
, I think we can just add a text field for input, and a list of time units in this box. The text field can take a series numbers separated by comma. I'll also link to the vignettes Irregular Sampling Schedules
section in the help.For weights
, the simplest way is just to add a checkbox in home range page, though that will apply to all selected animals. It's also possible to make a multiple choice selection box to turn it on for some animals individually.
I'm not sure if the actual usage need more detailed control, and if it's practical to implement all the options.
weights
can be slow for some datasets, and is not always needed. I think the best option would be to be able to turn it on for some animals individually.
When you add dt
for variograms, do you also need to apply it to the modeling process?
variogram()
dt
is specifically for the variogram calculation. ctmm.fit()
is totally robust to sampling irregularity and does not have nor need a dt
argument.
When I tested with gazelle
data, I met some errors because there is no timestamp columns and my code was expecting them. Is this normal? Should the app take this kind of data?
The variogram
dt
example in the vignette is with a gazelle and works for me.
Where were the errors?
The vignette code works fine. I wanted to test the dt feature in app with gazelle data, so I need to merge telemetry object into data.frame.
Previously all input data went through as.telemetry have long/lat, timestamp, x, y, t columns, and I rely on timestamp column to calculate the sampling start/end time. The gazelle data only has x, y, t columns, so my code met error here. I think maybe this is a special case and the app don't need to take this kind of data?
Is there any other movebank format data that can test with various dt?
This is the UI for dt
(just the interface, the function is not implemented yet)
UI for weight
.
It will be more intuitive if we can place checkbox around each plot, but that's difficult to implement since the plot are grouped together as a picture.
The gazelles and wolves in the package are anonymized and lack long-lat & timestamp data. You don't need to code for these kinds of data to work.
To test variogram()
dt
, I think the buffalo Pepper has both 1-hour and 2-hour sampling.
Instead of calling the dt
box "Irregular Sampling Schedules" I would call it "Multiple Sampling Schedules".
Should we annotate the titles of AKDEs with optimal weights?
OK, I'll change that title (it was from the vignette subtitle).
I'll also label the plot. How about something like this?
Cilla
(Optimally Weighted)
I just realized there are two more issues with dt
.
The UI above is just the user interface, I'm still in the process of implementing the functions.
@jmcalabrese @chfleming @NoonanM The weights
feature looks trivial at first, but it turned out they may need some changes in app logic - there need to be a button in home range page to trigger the calculation.
In Shiny you can update data according to user input with two approaches:
The home range calculation is an automatic updating reactive expression. Now the dt
input can be added to the automatic expression, the only problem is that when user was adding multiple selections, every one step in the process will trigger the update, so the home range calculation will run in every step, which can take too much time.
The visualization page have same design: if you select rows in the data summary table, even if you plan to select 3 rows, the 3 clicks will trigger changes 3 times, so the first 2 changes are wasted. This is not a big problem for visualization since the update is not too slow, but it will be a problem for home range calculation. The calculation cache doesn't help either since the calculation take the whole list as input, and partial change in the list means cache cannot be used.
There is no way to avoid this in the automatic update design: the app have no way to know if a row selection is the final update user want or just in the middle of series selections, it has to update according to every change.
To avoid the extra calculation in home range with dt
, we need to switch to manual update. There could be a button estimate home range
, and the calculation only begin with the button click. Now user can add all the dt
choices, click the button to update home range.
The change above is still doable, but I'm not sure if it worth the effort, considered that weights
is complex, with lot of tuning options difficult to implement in app (the options are more suitable for command line instead of web app), and it can be slow and not always needed.
I tried to detect the multiple sampling schedule, at least give user a histogram to show which animal have multiple sampling schedule. It's indeed not easy.
Pepper
have these schedules:
The histogram is difficult to read. There are too many irregular values (the data below have been rounded, otherwise it's messer), even the majority are 1, 2 hours. That is not obvious from the histogram plot because the axis is stretched too much by the big numbers.
> table(intervals)
intervals
0 1 2 3 4 5 6 7 8 10 12 14 16 18 20 22 24 26 28 30 34 36 40 42 46 48
3 570 796 15 95 4 74 1 43 23 26 12 7 8 8 8 3 4 3 1 5 4 2 2 1 1
52 74 84 112 122
1 1 1 1 1
So I have to assume user know the value and skip the idea of histogram.
I noticed previously I had a plan to implement dt
, res
, error
of variogram
and pool variograms.
It's better that all variogram related new features can be considered together in a consistent way.
res
parameter and pool variograms?dt
itself may need quite some space already. We need to specify animals, sample schedules numbers, units, and the whole page may need different treatments for different animals.I used 1, 2 hours for Pepper variogram. In the vignette example the variogram with dt is much more smoother, but here it's not that obvious. Is the result correct?
pepper <- buffalo[[4]]
dt <- c(1,2) %#% "hour"
plot(variogram(pepper))
plot(variogram(pepper, dt = dt))
Yeah, Pepper is not a very stark example like the gazelles. You could do some subsampling of any regularly sampled dataset to get the same effect, like
SUB <- c(1,2,3,4,5,10,11,12,13,14,15,20,21,22,23,24,25,30)
As for the other variogram options, I think dt
is probably the most used.
dt
is done, though I don't have a good data to test it yet. I tried subsampling but didn't have any obvious result.
Select animals, input intervals
Click Add
button, it will be added to a list, and variogram updated. So you can add multiple schedules for multiple individual groups.
The variogram dt call is really simple in ctmm, but to add an UI to input flexible paramters (with any animal, value, unit combination) and plug in current workflow is not trivial.
For home range, we are using akde(telemetry_list, CTMM_model_list)
to make sure they are in same grid. Can we turn on some individuals weight
specifically in this?
I added a line in variogram title if dt
is used so that it's easier to read the plot.
The weight feature will need akde
to take a logical vector for list inputs.
akde
can take now an array of weights
matching the list length of the data and model arguments.
Those dt
variograms look really weird. I'm going to take a look at that next.
It's possible just because I used sampled data (100 points). I'll try full data tomorrow, and if the result is same I'll generate some reproducible code for you.
Yes, the plot above is caused by the small sample of data. This is same parameters on the normal data set
For pooled variograms, I think in last meeting you said the pool variogram should replace individual variograms instead of an additional one. For example if we pool Cilla and Gabs, we should remove Cilla, Gabs and add pool of Cilla Gabs.
I'm wondering if user may have the need to try different combinations. For example first create a pool of Cilla, Gabs, then create a pool of Gabs, Toni. And do you think it's useful to compare the individual variogram and the pooled one to see difference?
If we just add the pool as additional variogram, it'll be easy to compare with the individual ones, and it'll also easy to implement different combinations. Otherwise if a pool replaced some individuals, the individuals cannot be used for another pool unless I specifically maintain another list of original variograms. That's still doable just with more complexity, and I'm wondering if adding pool as additional has more advantages.
I wouldn't worry about visual comparison, as most people will only pool when the individual variograms look bad.
For more flexibility, I suppose you could have one selection of the individuals for the pool copied over (by default) to a second selection of individuals whose variograms will be overwritten by the pooled variogram. That way you could deselect some of the better individuals to keep their individual variograms.
weights parameter is implemented. The plot titles are marked for easier identification, but I didn't change the home range summary table because I didn't find an easy way to add information without clutter. I suppose the plot title changes should serve the purpose.
Would it be worth saving space by noting the weighted AKDEs with the balance emoji unicode (or Libra symbol) in the title, instead of the second line of text in the title?
The balance icon in the app is fa awesome icon. Is there a unicode symbol can be used for similar purpose?
For the Libra symbol , I'm not sure if that symbol is intuitive enough for users. I didn't recognize this symbol before searching on it.
2696
It doesn't print correctly with this code. It seemed that only some font support this, and we need to specify font. However we can't be sure if a font will be available in user's system.
# some symbol work
plot(1, xlab='\u0298')
# but scale symbol is not printing correctly
plot(1, xlab='\u2696')
"\U2696" on the command line and in the title()
command works for me in Ubuntu.
Yes it's platform dependent.
cat("\u2696")
in rstudio console also works for me,
but plot title doesn't work. The console and plot system probably used different font.
Is the Libra symbol supported on Mac?
If so, I suppose we could detect OS and then do scales on Linux/Windows and Libra on MacOS, matching consistency with the selection box.
Libra can be print in rstudio console but not in plot.
Fonts could be complex as different linux distributions actually can have different default fonts. See discussions here.
When we are saving plot into pdf and png, it can bring more problems further since they are another platform...
Looking around, it doesn't seem like there's a simple solution for plotting unicode symbols on mac.
When you pooled some variograms like "Cilla, Gabs", and only the pooled variogram is shown, what should we do with the ctmm.guess
values? Previously it was ctmm.guess
applied to every telemetry object.
I think we still need to apply ctmm.guess
to individual telemetry object, then we still need to show individual variograms in guesstimate tab? So maybe we just ignore the pooled variograms in guesstimate and modeled tabs?
I found to insert pooled variogram in current workflow need too many changes. Previously the app assume each individual have one variogram in 3 modes, and selecting models need to select specified individual variograms.
dt
changed variogram but didn't change the 1 on 1 mapping, so there is not much problem. pooled variogram need to add some new variogram and remove the individual variograms in pool, which break the 1 on 1 mapping of variogram to individual, caused too many problems in multiple places.
Maybe it's easier just use a separate tab for pooled variogram so they are separate from the regular work flow.
Compare to a separate tab, adding pooled variogram to individual variograms will be simpler in UI. By keeping individual variograms intact, put pool variograms into pure additional plot, the original work flow can be maintained without much change.
The ctmm.guess
value would get duplicated, but its value is needed for the individuals' ctmm.fit
call, and then again to compare against the result of ctmm.fit
on the individual.
What's the difficulty in simply replacing the individual's variogram with the pooled variogram? Then each individual has one variogram.
If we replace each individual's variogram with the pooled, does that mean a pooled 3 variograms will be plot 3 times?
So with pooled variogram we are supposed to also supply that pooled variogram to variogram
parameter in ctmm.guess
call?
ctmm.guess(data,CTMM=ctmm(),variogram=NULL,name="GUESS",interactive=TRUE)
If we just plot pooled variogram in each individual's variogram plot, that can maintain the 1 on 1 mapping and make everything simpler, even the plot is duplicated a little bit.
Oh, I realized I can keep pool as individual variogram but only plot the pool once.
Yes, yes, and yes.
I found it's quite cumbersome to maintain correct plot titles when tab 1 removed the duplicates in pooled variogram, and other tabs keep one copy for each individual in the pool, because I need to use animal name or model name for first line, mark the dt
parameter or pool variogram usage in 2nd or 3rd line.
Since tab 2 and 3 already are showing individual copy of the pooled variogram, if we use same arrangement in tab 1, it'll be much easier to implement, also be consistent in UI, probably less surprises for users.
It will also help this case: if we created a pool of animal 1, 2,3, then create a pool of animal 2, 4, then animal 2 actually have the pool variogram of (2,4) instead of (1,2,3). If we remove the duplicates, we will showing both pool variogram, but it's not clear which variogram will be used in 2. With individual copy shown, it's obvious that animal 2 is using the 2nd pool variogram.
Now the plot title can show any combination of multiple sampling schedule/pool variogram in 3 tabs. The 1st tab also show each individual variogram for pool variogram, same with other tabs.
I think the home range plot titles don't need to carry all these additional variogram information, right? They are based on the modeling result, and only need to show the models (and optimal weighting if applied).
ctmm.guess
is taking the updated variogram (multiple schedule, or pooled) as parameter now. The dt, weights, pool variogram features are finished.
@jmcalabrese @chfleming @NoonanM I'm adding discussions about multiple sampling schedule here since this is the original thread. In my early exploration fo this feature I tried histogram and frequency count, but didn't realize kmeans can be used for this.
I'm planning to add a histogram of sampling schedules, also calculate kmeans with default k=1.
At first I was thinking use k=2, but it will return 2 values even for obvious regular data which may confuse users. Maybe it's better just use k=1 as default, only increase k after verified from histogram.
For the histogram of sampling schedules, the axis labels need to be improved:
The default value is number in seconds, which is not very intuitive in reading.
We can transform them to best units suit for its scale (minutes, hours etc), however the transformation here happened after the breaks in plot is determined, so the transformed values are not integers, which looks a little bit weird.
Alternatively, this transformed the time into a specific time starting from 00:00:00, which should be easier to read, but in concept the value actually changed from a time duration into a specific time in one day. I'm not sure this is a good idea.
To make things more complicated, some data have very small sampling time (much less than seconds, for some cell data), I'm not sure what these will look like with that kind of data. @chfleming Do you have some data in microseconds sampling interval?
It would be awesome to have support for the "dt=" and "weights=" arguments, as outlined in the vignettes.