Closed kjklauder closed 6 years ago
Thinking again, we need to present the kmeans result, which may be best represented in the original format. So maybe just using seconds will be simple and consistent. I'll implement the feature first to see how it works, and adjust the units later.
If we label the kmeans center in the histogram, it should be easier to read. The kmeans result table is still needed as we need some way to adjust k for every individual.
I have to use a package ggrepel
to avoid label overlap though. Considering we will have a table showing kmeans result, maybe we can also remove the label to reduce one package dependency.
I implemented the kmeans feature roughly. If user click a checkbox, the multiple schedule box will extend with more controls:
a histogram of time intervals, and a table showing running kmeans on each individual. the table will show kmeans result, k value, also the kmeans result will be marked in histogram.
By default 5% outlier in intervals are filtered out, and there are 7 bins in histogram, these can be adjusted with sliders.
k
will change k for those rows.I used a slider to change k for selected individuals. However when user is dragging a slider, the intermediate values could be output as the computer will not know if that's some intermediate value in dragging or the value user want. This means if user is dragging slowly from 1 to 5, the value 2, 3, 4 could be output in the process.
Each new k value will trigger changes in the kmeans result table, which in turns reset the row selection, so the later values will not work when there is no row selected.
An alternative is to use numeric input, which can be used to input the value directly. If user do use the arrow in the input box, similar thing will happen if user increased from 1 to 5 step by step, but it's more understandable that each click count for one change.
I plan to change the slider to the numeric input.
The table and plot now have the values converted to better unit
@chfleming I experimented with kGmedian
in package Gmedian
, but cannot make it work with 1-dimensional data. I searched but cannot find any example discussing this.
> x <- matrix(diff_t, ncol = 1)
> cl <- Gmedian::kGmedian(x, ncenters = 2)
Error in stoKmed_rcpp(x0, X, centers, gamma = gamma, alpha = alpha) :
Not a matrix.
It looks like the R package Ckmeans.1d.dp can do k-median clustering and select k via BIC given a range of k values to select from. k selection should help avoid the redundant cluster estimates.
This function works but I'm not seeing the effect of ignoring outliers. With Pepper in buffalo, k = 2, this is result I got
cl <- try(Ckmeans.1d.dp::Ckmedian.1d.dp(na.omit(diff_t), k))
It still put the outliers into a cluster of size 166, and the remaining points into one cluster of size 1588 with center 7200.
I tried to give k = 3 and hope that outliers will take one cluster, and we get the other 2 real clusters at 3600, 7200. Still, outliers were split into 2 clusters, with most values still in one cluster.
At some point can you add a feature to turn weights on in the home range estimation for all individuals at once?
I added a checkbox to select all individuals. You still need to click apply
button, but I think this is useful as you can select all then remove some of them.
The ggrepel package is not automatically downloaded with the package.
Thanks, I just added it and updated the package
It would be awesome to have support for the "dt=" and "weights=" arguments, as outlined in the vignettes.