Closed xhdong-umd closed 7 years ago
In windows VM, it took 68s to select model for 1 individual, 153s for 6 individuals, 59s to run akde
. The parallel speed up is 2.66x, not too bad.
For the model summaries, I wanted to convert the text format to a table so we can list multiple individuals together and compare easily. It will definitely save some space in app. So instead of
$Cilla
$Cilla$`OUF anisotropic`
$Cilla$`OUF anisotropic`$DOF
mean area
13.16465 22.59234
$Cilla$`OUF anisotropic`$CI
low ML high
area (square kilometers) 280.308976 444.213143 645.24293
tau position (days) 3.498487 5.998142 10.28379
tau velocity (minutes) 43.708867 46.725486 49.95030
speed (kilometers/day) 15.441216 15.705364 15.96945
$Gabs
$Gabs$`OUF anisotropic`
$Gabs$`OUF anisotropic`$DOF
mean area
3.845523 6.615316
$Gabs$`OUF anisotropic`$CI
low ML high
area (square kilometers) 164.695983 423.07464 801.21216
tau position (days) 3.423846 14.54973 61.82950
tau velocity (minutes) 21.474275 23.76911 26.30918
speed (kilometers/day) 13.348792 13.74202 14.13509
We can have something like this
However after I looked the summary.ctmm
code, I found this is not easy.
ctmm
object directly need more understanding of ctmm
structure.summary.ctmm
changed and my conversion code can break.For now I think I'll just use the summary output directly and finish the main feature first. In the long term do you think it's beneficial to generate a summary table for ctmm object(maybe use another function, the summary.ctmm
don't need change since this is the common usage)?
If you set units=FALSE
, then you should always get SI units from summary.ctmm
. Tell me if any other functions need this argument as well.
There is a level
option for ctmm.select
that determines how aggressively alternative models will be considered. If other models are very unlikely and level<1
, then ctmm.select
may only return one model with verbose=TRUE
.
Getting all of the AKDEs on the same grid is for being able to (more quickly) calculate overlaps afterwards. I could potentially code that to parallelize, but with large datasets you can run out of RAM when calculating/manipulating large rasters, and I haven't looked into figuring out how to prevent that.
I didn't notice the units
parameter! I think there are two approaches to get a table summary:
table = FALSE
to summary.ctmm
. By default it still act the same. With table = TRUE
I can add some code to generate the summary in table format. By adding code in summary.ctmm
I can access all the values directly and less depend on specific summary output format. If feel the table summary (with a list of model it's also easier to get a summary table) is somewhat useful to ctmm
, I can add the code to summary.ctmm
. Otherwise I can just work on summary output and only use this in the app.
I knew about the level
option in ctmm.select
. I was just using default options in my code. Do you think we need to use different value other than default level
?
It seemed that we better keep all AKDEs in same grid. The time cost in this step is not too big so I think it's OK.
The only thing that should change about the output of summary.cmm
in the near-term is the possible addition of further rows for additional parameters.
The default level
argument should be fine as long as ctmm.select
and ctmm.fit
are working properly... and if they aren't then that's something I try to fix with high priority.
@chfleming @jmcalabrese
I've just finished on model selection and home range. The app is updated here. The summary of models and home range are still direct print of summary
, I'll work on table format output in next step.
I reorganized the app structure:
dt
, res
, error
, pool variogram
etc. I'll work on it in next step.Clicking the button Fit Models
will do model selection and use the best model to draw variograms, switch the variogram mode to model. Model summaries are printed in either concise mode or detailed mode. It should also take the manual fit parameters as input, but I haven't fully test it because I'm not familiar with the behavior of the models so it's difficult for me to judge if it was correct.
This step used parallel mode in Mac, windows and hosted app.
With model fitted, switch to home range page and home range will be calculated. The model summary and plot are generated.
This step didn't use parallel because we need to calculate all individuals in one run to use same grid. Fortunately this is not as slow as model fitting.
The layout of the home range plots is actually controlled by the variogram layout box. I didn't add yet another control box since it will be redundant. You can change the figure height and columns in variogram page and the home range plot will update.
In next step I'll work on
@chfleming I found summary.ctmm
with units = FALSE
will return speed at m/day
, which is not SI unit. My conversion function was using ctmm:::unit
with speed
dimension and expected m/s
.
> summary(model_select_res[[1]][[2]], units = FALSE)
$DOF
mean area
16.89156 30.89841
$CI
low ML high
area (square meters) 2.716054e+08 400020869.08 552902892.77
tau position (seconds) 1.737586e+05 389310.18 872258.50
tau velocity (seconds) 0.000000e+00 6889.99 50144.14
speed (meters/day) 1.352319e-03 11499.30 35429.87
I can add exception to my function but maybe summary.ctmm
should return m/s
with units = FALSE
?
I finished the model/home range summary table and updated the app. The table also have same color theme with data summary table in visualization page.
It's tricky because some columns have a lots of digits, might related to some near zero values forced the whole column to be formatted with lots of digits. I have to round off specifically for them. I'm using digits = 2
and round to 3 digits for some columns. If you feel more digits are needed please just tell me.
And the speed column I got was m/day
, I'm manually converting it to m/s
now. If summary.ctmm
will update to m/s
with units = FALSE
, I will update the code accordingly.
Do you think it's meaningful to have an option to hide low
and high
values and only show ML
values? That can reduce the rows needed to present for multiple models with multiple individuals.
@chfleming @jmcalabrese In working with occurrence, I found I didn't add the option to use different models in home range (probably spent too much time on table summary format). I plan to use this structure both for home range and occurrence:
home range
page or occurrence distribution
page, the first model of each individual are used. This should be very flexible.
For occurrence, I think it's useful to have a checkbox for add/hide location overlay, like Figure S4.1
of appendix 4 of 2016 MEE ctmm paper. However I didn't find the proper code to do this. I tried these but they didn't work as I want. Is it possible to adjust alpha for overlay layer?
plot(ud)
plot(tele_list[[i]], UD = ud, col = "white")
In the end, do you think it's important to add an export feature? The time is not enough to have a fancy report, but at least we can just export the data, models and plots in a .Rdata
file, then user can do more things with them. If this is too rudimental with little use, we can also hold off the export feature, just implement the more proper report later.
In last meeting we mentioned that we need to start wrap up app sometime before the AniMove course. If allowing 1 week to wrap up, we still have time until Aug.18 to work on features, right? I think previously we said to stop on features at one week after the meeting, but to finish occurrence properly probably need more time.
I have made the app work with any model selection.
Error in `row.names<-.data.frame`(`*tmp*`, value = c("min", "max")): invalid 'row.names' length
@chfleming I can share a script to reproduce the error if you want to have a look.
user system elapsed
56.903 4.659 61.555
user system elapsed
29.489 2.667 32.159
user system elapsed
41.873 3.874 45.743
user system elapsed
156.817 82.872 239.649
user system elapsed
437.447 193.244 630.608
user system elapsed
28.931 2.928 31.854
Sometimes there are 3 types of errors in occurrence calculation or plot. It could be random: same selection will work this time and fail next time.
running parallel with mclapply in cluster of 6
Assertion failure at kmp_runtime.cpp(6480): __kmp_thread_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6480).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
Warning in min(x) : no non-missing arguments to min; returning Inf
Warning in max(x) : no non-missing arguments to max; returning -Inf
Warning in min(x) : no non-missing arguments to min; returning Inf
Warning in max(x) : no non-missing arguments to max; returning -Inf
Warning in value[[3L]](cond) :
Mvubu - OU anisotropic: Error in plot.window(...): need finite 'xlim' values
Warning in value[[3L]](cond) :
Mvubu - OU anisotropic: Error in `row.names<-.data.frame`(`*tmp*`, value = c("min", "max")): invalid 'row.names' length
It seemed that the first error was caused by R 3.4, data.table, mcapply.
Since user can select models for home range and occurrence, I went one step further to make the variograms map to selected models too. Any combination is possible, and user can compare variograms of different models for same individual.
I also pre-select the first models after model fitting finished. So it's more obvious which models are used for variograms and other pages.
Would you also be able to get "area" when calculating occurence for each animal?
@xhdong-umd mcapply. I don't think it will work on a windows based pc. I've had trouble using this in other calculations I've done. I know taht Microsoft will relaese and win 10 that will be capable of using 4 cores, but I don't know if it is out yet. I think Unix-base OS'es atm are the only ones capable of MC (please correct me if I am wrong - :) )
@vestlink The app will choose different parallel methods based on platform: mclapply for linux/mac, sock based parLapplyLB
for windows. You can try the app in local mode, and the console message will show some information of the parallel method and the time used in the process.
@chfleming It seemed that the code I tried for overlaying location data actually works as intended.
The plot in paper
The plot of UD
The plot of location data (green) with UD
If I set level.UD = 1
that seemed to remove the contours and get similar effect with plot in paper:
plot(selected_occurrences[[1]], level.UD = 1)
Interestingly I cannot use
plot(tele_list[[1]], UD = selected_occurrences[[1]], level.UD = 1)
# Error in if (any(I)) { : missing value where TRUE/FALSE needed
Do you think these options are useful for occurrence plots:
Should we generate the plot without contour as options? If so, how can I remove the contour in last script?
Maybe we can just add one slider for level.UD
. The contour will disappear when level.UD
become 1. Right now I didn't the obvious difference with or without location overlay.
This is UD plot with level of 0.95
with level of 1
I didn't see much advantage of the version with location overlay compare to the above two. Actually your paper was saying location overlay is interfering with occurrence plot, so the one without location should be desired, which is the default behavior of plot(UD)
. I do see different level values can be useful.
I implemented the level slider.
The data in plots below was sampled data so they are not as clear as the full data plot.
I also didn't meet the previous errors of plotting occurrence so far.
with level = 1
@xhdong-umd units=FALSE
on speed and diffusion parameters should be in SI units now.
One nice feature one could think about is the possibility to export the 95% (or other % UDs) as shapefiles for import into e.g. QGIS for further analyses...
Sometimes you would also be interested in multiple CL as 20,50 and 85%. Maybe one could think of a way to pass something like this c(0.20, 0.50, 0.95) when making the occurrence plots
The shapefile export feature will be a part of our general feature of export/save, which should include everything generated in app. That's an important but complex feature, I will start to work on it later.
If everybody think this shapefile export is good to have now, I can also add it now.
The easiest way to input multiple CL seem to be a text box accepting something like 20,50,95
, though it will be a little bit more cumbersome than a slider. @chfleming How do you think about this?
I have updated the app to use the latest ctmm
with units=FALSE
change. Note the app will report much bigger speed values with previous ctmm version.
I also added a text field for CL:
Looks good
On 18 Aug 2017 16:13, "xianghui dong" notifications@github.com wrote:
I have updated the app to use the latest ctmm with units=FALSE change. Note the app will report much bigger speed values with previous ctmm version.
I also added a text field for CL:
[image: screen shot 2017-08-18 at 10 11 10 am] https://user-images.githubusercontent.com/25039897/29462571-d1a4c9b4-83fd-11e7-8d6b-fc17f5a3cc59.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ctmm-initiative/ctmm-webapp/issues/27#issuecomment-323364928, or mute the thread https://github.com/notifications/unsubscribe-auth/AT0_HB-Vk5wvXO16fTqiqWFTc3O_3mtJks5sZZwBgaJpZM4OmykG .
The main hosted app is also updated since the features are mostly stable now.
@xhdong-umd
col.level=NA
and also maybe level.UD=NA
.@vestlink
I found contours can be turned off with level.UD = 1
, i.e. input 100
in the text field, that should serve the purpose. Adding another checkbox will conflict with the level input box.
I'm not seeing any error with occurrence now with the updated ctmm
.
For exporting data, what kind of options should be provided?
sp
, raster
, and ESRI
?Shiny only support to download one file with clicking a button. I think maybe it's easier to have an export page, with all kinds of options and multiple output, then I will compress all the files into a zip.
I can also add the plot output if that will be useful, though I'm not sure if there is time to implement all of these. I think you mentioned before that some plots should be saved as pdf instead of png?
I think shapefiles for the home-range contours will be the most important export support (aside from PNG images), because that's a format for people using GIS software, whereas sp
and raster
are for working within R.
I'm not sure if there are unintended consequences with setting level.UD=1
, as that would be drawing contours around the rectangle of the raster.
If it's not conceptual right to remove contour by setting level.UD=1
, we can use this input convention:
This should be intuitive and easy to switch between different options.
For the export feature, maybe we should just do the single most important feature of exporting home-range shapefiles. The other options will need a complicate design which probably need to be redone later.
That sounds good for the exporting.
For the parameter estimate CIs, can you have the colors step up: light (low), medium (ML), dark (high) on each individual if they are going to be on their own row.
@vestlink: I agree with @chfleming about not giving an area estimate for occurrence dists. We don't want to propagate that practice, given the way that folks in the literature misinterpret Brownian bridges as home range estimates.
@xhdong-umd: I think a quick-and-dirty shapefile export for home ranges is fine for now, and then a more comprehensive and carefully designed suite of export options can come sometime after animove.
Just took a look at the updated webapp, and I think the workflow is very streamlined overall. Nice! I like the reorganization of the former variogram and model selection pages into a single page and think that works very well.
A couple of observations:
1) The default canvas height (400) for figures on the visualization page renders really small on my system. I had to set it to 1000 or so to get it to look decent.
2) I'm not comfortable with having the CIs on all parameter estimates hidden by default. I see the problem with the display getting too clunky if you show them by default, but I don't want to subtly encourage users to ignore the CIs by hiding them. I don't have a good idea about how to show them without cluttering things up too much, but will think about it.
3) The "Confidence levels of contour" box title on the occurrence page is misleading. I would say "Contour level(s) of the occurrence distribution" instead. That controls the contour on the distribution estimate, not the confidence intervals on a particular contour estimate, and I don't want users to mistakenly think they are adjusting the level of a confidence interval.
4) The option to change the displayed contour (as above in #3) would also be useful for home ranges. The same suggestion about terminology would apply.
5) In the model parameter estimates table, speed displays as "NA km/day" for models like OU where speed is undefined, which seems a bit clunky. Could you just simplify that to "NA"?
@chfleming I adjusted the level input of occurrence plot as above.
I applied different luminance (lightness) for different estimate CIs like this. I'm not sure this will work well, since the combination of lightness and color may look different depend on the actual color, some row may become difficult to read.
@jmcalabrese
I found 400 is not too small if the plot is a landscape picture, because the limiting factor is the page width, when increasing the figure height will not increase figure size at all. With a portrait picture 400 is often too small, originally I have the canvas fixed at 730. However that will have lots of empty space for buffalo data.
I can set the default to be a bigger value, but 1000 could be too much, especially we have a lot of boxes in same page. Maybe 600 as a start? I can also set the step to be 200, which should make adjust easier. User can always input a number directly in the box.
@jmcalabrese are you using a high resolution monitor? With a high resolution 400 pixels could be really small. Unfortunately we can only control the absolute pixels instead of something including dpi.
I can change the default to always show CIs. How about we just don't color the low and high rows? That way the table should be obvious and still easy to read. I'll make a screen shot later.
I'll modify the box title. 4. I'll also look at the input for home range plot.
I tried to just get NA in the beginning, but it was quite tricky. The function I used came from scales
package, which will not have any special treatment for NA. I modified the function a little bit to recognize NA, then the function is not vectorized anymore. I thought too much efforts are needed just to deal with this single point, so just moved on on the main features.
I'll look at this again and see if it can be improved.
@xhdong-umd:
My display res (1920x1080) is not super high by current standards, but 400 still renders pretty small. I would say bump the default up to 600 and lets see what other users think
For displaying the CIs could we just do MLE (Lower CI, Upper CI) in a single column for each parameter? Then there's still some visual distinction between the MLE and the CI, and it should take up less horizontal space then having separate Lower CI and Upper CI columns for each parameter. Furthermore, then each individual would only have one row, which should be cleaner.
If fixing the NA speed thing is too much work, then don't worry about it. It's not a big deal.
@jmcalabrese @chfleming according to ?plot.telemetry
level.UD: Confidence level of Gaussian ctmm model or UD estimate contours to be displayed. I.e., level.UD=0.50 can yield the 50% core home range within the rendered contours.
(I picked some words from this to make the previous title.)
level : Confidence levels placed on the contour estimates themselves. I.e., the above 50% core home-range area can be estimated with 95% confidence via level=0.95.
I'm adjusting level.UD
in plot(selected_occurrences[[1]], level.UD = c(0.5, 0.95, 0.8))
for occurrence plot.
For home range plot, every usage below gave me some errors. Is there something wrong with the usage? I'm not sure which one I should use for home range plot.
plot(tele_list[[2]], UD = selected_hrange_list[[2]], level.UD = c(50, 95))
plot(tele_list[[2]], UD = selected_hrange_list[[2]], level = c(50, 95))
plot(tele_list[[2]], UD = selected_hrange_list[[2]], level = 90)
# this one works but with warning Outer contour extends beyond raster.
plot(tele_list[[2]], UD = selected_hrange_list[[2]], level.UD = 90)
@jmcalabrese I'm not sure I understand the MLE (Lower CI, Upper CI) in a single column for each parameter
. Does you mean put 3 values in one cell? Can you show some sample rows?
For the NA values, I realized I can actually add one extra step after the unit formatting. It should be easy to do this way.
@jmcalabrese and @chfleming The 95% Brownian Brigde not being a valid interpretation as a home-range estimator indeed news to me. I may have fallen into the common pitch of mis-interpretation, but have been guided by very recent literature that has mentioned e.g. Brownian Bridges methods as valid estimators. See e.g. "Is there a single best estimator? Selection of home range estimators using area-under-the-curve" by: W David WalterEmail author, Dave P Onorato and Justin W Fischer.
Would their inclusions of BBMM and dBBMM as home-range estimators be wrong in the light of the above?
Should this be discussed other places than here?
I can see, though, why you would want get information regarding the home range as well as the occurrence dist and combine these when analysing e.g. selection, where the home-range would be the available resource and the occurrence dist as the use.
@vestlink Yes, the lumping of the BB with home-range estimators by Walter et al (and pretty much every one else in the literature) is incorrect. KDE/AKDE and BB/CRAWL estimate completely different distributions. We pointed this out as an ancillary fact in the first appendix of the AKDE paper and in the intro of the Kriging paper.
Justin is leading a paper specifically focused on this distinction that will use real data examples to show how divergent these two classes of estimators can be, and why they shouldn't be interchanged.
@xhdong-umd
level
arguments should be between 0 and 1.I'm inclined to use light gray, medium gray color for low and high, compare to all ML rows with color. I'll post a screenshot and judge if it will work.
I have made this mistake before and correct it later, but forgot this again!
@chfleming Thank you for your clarification. The paper will indeed be appriciated. But, I think it is to be expected in a fast pacing and relativley new field as animal movement ecology is.
@chfleming To have low to high span 3 columns will need 12 columns for area, tau position, tau velocity and speed, though this is something we have to make decision on compromise.
The previous multi rows is not ideal because it's kind of difficult to separate between rows. Put them into columns make each model to one row, which will serve the purpose of table much better - compare models, select models etc.
I thought about this format but not sure if it's optimal. Put 3 values in one cell, separate with some separators. The units are not added yet.
However the columns will not be able to be sorted in this way.
Maybe the best option is still to put all values in columns, even that means user often need to maximize the app, use horizontal scroll bar. I think the app need max screen estate for most of its tasks anyway.
UPDATE: I just realized @jmcalabrese 's suggestion is like this:
It's conceptually more clear than above, and should take less space than all 3 values in columns. However I found the main determining factor for column width is the header, so tau position(CI low, CI high)
is going to take a lot of space (even more than the 3 values cell above).
This is a table without alternating row background (striping), with low and high rows in gray. I'm not really satisfied with this format since it's still too overwhelming. There are some duplicated information in the low/high rows. I also have to use a color_target
row which is redundant in information but needed for coloring.
@chfleming I tried this but there are still some problems.
# plot location with UD, using multiple level.UD values will have error
plot(tele_list[[2]], UD = selected_hrange_list[[2]], level.UD = c(0.50, 0.95))
# the condition has length > 1 and only the first element will be usedlonger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement lengththe condition has length > 1 and only the first element will be used
# plot UD itself with multiple level.UD values will have error will have warning, but the plot is generated
plot(selected_hrange_list[[2]], level.UD = c(0.50, 0.95))
# longer object length is not a multiple of shorter object length number of items to replace is not a multiple of replacement length
I changed the canvas height default to 600, updated the title of occurrence plot level input.
I also replaced the NA values in model summary table to empty cells. I think it should be less cluttered compare to a explicit NA
value.
I'm still not satisfied with the model summary table. I'll look at the "3 values in one cell" idea.
A digression: Have you noticed when you try to sort selected outliers, sorting is not consistent.
distance_center do not sort correctly as far I can see
It looks like sorting is independent of animal ID for all but the distance_center
@vestlink I'm not sure what did you mean. User can sort the table by clicking the arrow in each column. There are reasonable reasons for several sorting choices, so I'm expecting user to sort by themselves. And the screenshot shown is in speed tab which will pick points by speed.
The sorting is still not perfect since the values are text with the units instead of numbers, so 9.9 will be sorted as larger than 10. The only way I can think of to fix this is to have the values and units in separate columns.
I added a column of SI values so it can be used for proper sorting.
@chfleming Here is a reproducible code with data for the warnings of home range plot with CI levels.
@chfleming @jmcalabrese I wrote a script to do model selection and home range for a list of individuals, use parallel when possible. This is the result of script.
Questions:
ctmm.select
even withverbose = TRUE
, that's normal, right?akde
so it will calculate with same grid? This means we cannot parallel this process. The time cost for this step is about 34s for 6 individuals.