ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
32 stars 21 forks source link

Deal with possible error in speed outlier calculation #24

Closed xhdong-umd closed 7 years ago

xhdong-umd commented 7 years ago

The original outlier issue discussion is too long so I'm creating an new issue here.

Problem

Some data may have unexpected errors that make speed outlier calculation fail. Currently the app calculate the speed in importing, so it will fail with this kind of data. We cannot predict all the possible data errors so this can still happen in future if we fixed for known cases.

solution a

Instead of calculate speed in importing, I tried to move the calculation to outlier page only. Thus the app will still work until the outlier page is clicked.

After overcoming some subtle Shiny challenges, I finished this and updated the repo. However this approach have one disadvantage:

Because the speed calculation only happened after user switched to outlier page, this actually updated the data. When user switch back to the visualization page, the plots will refresh because the underlying data changed, even the plot related data didn't change. This can take some time when the data set is big.

This is how Shiny reactive works and it's hard to have further control on this.

Because of this I'm inclined to solution b.

solution b

Still calculate speed on import, but use a simple fall back speed definition when the more sophisticated version fails. I have two simpler versions:

These versions should be more robust for different data. Actually I think most problem should cased by infinite, NA, NaNs, so they will not cause problem if we just ignore or replace them.

I'll save my current implementation with solution a to version control, then switch to solution b.

xhdong-umd commented 7 years ago

I implemented both solution a and b. The new implementation will not have the plot refresh I mentioned earlier, but it need to calculate speed on current subset on demand. This should be better than refresh plots because math calculation usually is faster than rendering plots.

By isolating outlier calculation to outlier page have many other advantages, but it took me quite some hard thinking, discussion with Joe Cheng, and experimenting to make all these moving parts work right. Especially the time subsetting can also update the underlying data.

Now the speed calculation use ctmm function by default, but will fall back to a pmin method implemented by myself, which removed some infinite values. Dealing with NA, NaN, infinite is tricky, my current implementation may not be optimal, but we will wait for more use cases to further improve this.