Closed xhdong-umd closed 6 years ago
I implemented plot_loc
and found a serious bug. After some plot in RStudio, the shiny app will have all the plots shown in RStudio plot windows instead of app itself.
It only appears when app is saving plot into .png
. This should not happen, I'm not sure if it's problem with ggsave
or shiny
or RStudio
. I have asked question in forums, hopefully can be fixed quickly.
I found a workaround to avoid the problem. The problem is caused by ggsave
and Shiny.
plot functions in visualization page are finished, with help updated and linked in first post.
plot for variograms is done.
I moved individual code examples in each function into a vignette, since most of them are part of a workflow, and it's easier to explain them under one document.
plot for home range and occurence are done.
Though it will take quite some efforts to write example code for this part, since I need to duplicate the in app process of maintaining model list and names etc with a different workflow as simple as possible.
I have finished on map page, and updated the repo and reference website. With almost all functions finished and documented, I updated the first post with the reference website page. You can click the reference page to look at help on every function.
I decided to move code examples into a single vignette so there is no example in help now. It will take some time to write the vignette, because I need to reproduce a lot of user selections and a lot of operations in app which are not exported in package yet (and probably need a different usage).
@jmcalabrese @chfleming I just finished the main vignette. To establish a workflow with package functions, I have to create quite some new functions for user's convenience, which duplicate some app behavior in different ways.
You can have a look at the vignette, hopefully it's not hard to read.
I have updated the repo, the package website, and set the package version to 0.1.0
.
I just updated the hosted app to match the package version.
Next I'll look at the map animation possibilities before our next meeting.
@xhdong-umd: Thanks for putting the package vignette together. Some comments and edits to the vignette are in the attached .pdf. Let me know if you have question or want to discuss any of it. Package Usage • ctmmweb_jmc.pdf
@jmcalabrese Thanks for the edits, the vignettes is in a much better shape now. I probably should take more time to review and edit it again, though it was hard to resist the urge of finishing it before the holiday vacation starts :sweat_smile:.
Here is a diff of the RMarkdown. You can choose among source diff mode, rich diff mode or view the file directly.
For some of your comments:
data.table
in the code but lacked description text. I added a sentence and some references about it.dt
variable name came from data.table
usage convention. I changed it to loc_data
. used_core = 6
or reserved_core = 2
? I think a number of reserved cores is better since it can be used across different platforms. And maybe we should make it to reserve 1 core by default?multiple models fitted for each animal if they are validate candidates
, I was referring the fact that ctmm.fit
can return single best model (if all other models can be just ignored) or multiple candidate models for one animal. I think just removing if they are validate candidates
and stating the multiple result possibility should be OK.I have added a reserved_cores
parameter to the parallel functions.
Though one tricky point in parallel core setting is that there are physical cores and logical cores. A windows pc with 4 physical cores often have 8 logical cores via hyperthread. The usual advice is to use physical cores - 1, however I found it's better to use cores = list length
if the number is not too big. Because running 8 jobs on 8 threads is always faster than 8 jobs on 3 threads, even if you only have 4 cores, since the jobs are much more balanced with more threads.
To make sure some physical cores are not used I cannot use logical core number in this mode. So it's possible the default mode used 8 logical cores with 4 physical cores machine, but with 1 reserved core it can only use 3 cores.
Hi @xhdong-umd , nice work. I have some minor suggestions. @jmcalabrese may feel free to chime in.
plot_vario
and plot_ud
that have similar functionality but act on different class objects. I think the R
way of doing this would be to define a generic and then define different dispatch methods. For lists of objects, which could be a list of variograms or a list of UDs, I create a list
dispatch method to handle the final dispatch (see plot.list
in generic.R
of ctmm
).plot_vario
that add bells and whistles to base ctmm
functions (plot.variogram
) and take the same objects as inputs, they could possibly be integrated into base ctmm
functions---specifically if plot_vario
is just using par(mfrow=...)
and title
, then we could add the mfrow
argument to plot.variogram
that would plot lists of variograms in a grid instead of ontop of each other with their names as titles.data_sample
would be better as a class dispatch method for a generic, even if that's the only method with that functionality. In the case of data_sample
, the logical thing would be making a sample.telemetry
except that base sample
does not have a ...
argument allowing it to be properly overloaded (and sample.list
already exists with different functionality). But you could make a differently named generic, like resample
or something._
in the function names could be gotten rid of in favor of R-like functions with parsimoniously named generics that are overloaded for different object classes.@chfleming I'll try to make some methods to generic. Some comments:
?plot.vario
for help when they are using plot(vario_list)
. Though I think there is no way to work around this...# default usage
plot(vario_list)
# more parameters
plot(vario_list, model_list = NULL,
fraction = 0.5, relative_zoom = TRUE, cex = 0.65,
model_color = "blue", columns = 2)
# for home range
plot(UD_list)
# more parameters
plot(UD_list, level_vec = 0.95, columns = 2, cex = 0.65,
tele_list)
plot_vario
is taking a list of variograms, and the names of list items should have some information in empirical variogram mode. I have no problem to integrate it into base ctmm
, though I'm not sure if that will bring too much additional requirements (see the help page of plot_vario
), since there are 3 modes depend on the model_list
parameter. _
in names. I saw extensive _
usages in some of R packages (like stringr
and some of Hadley packages), so I adopted this style after comparing the other options. We could remove some of them with generic names, but there probably will be quite some left.@xhdong-umd
generic(universal_args,...)
and then the class specific methods as generic.class(universal_args,class_args,...)
. The generic arguments need to be arguments that every method will always have. Unfortunately, there are many R
functions that take up good names but don't make good generics because of their argument structure. E.g., sample
.
I have an example S3 generic emulate
if you want to see. In 1.R
I have the S3 generic defined (I keep this stuff separate in the first alphabetical file due to peculiarities with S4---some generics that I import are S4 generics). Then in emulate.R
I define the class methods. In NAMESPACE
I export the generic and the S3 methods. In man/emulate.Rd
I document the generic and S3 methods inline with CRAN standards. S4 is a bit different and I actively avoid it.@chfleming I updated some dt
parameters to loc_data
to avoid the naming conflict mentioned by @jmcalabrese .
For the functions can be made generic, I went through the exported functions:
tele_list_info
, this is similar to summary(buffalo)
but with different columns. To make it a generic would conflict with summary
in ctmm
. I changed the name to report
.merge_tele
, merge has a different meaning from merge
in base R. I don't think it's a good idea to have it like merge(buffalo)
. We could do as.data.table(buffalo)
since this is the expected function to convert an object into data.table
. However they usually convert a list into data.table with each list item as a column, which is different from our behavior. We will need to override that, which is not optimal. I changed the name to combine
now.plot_loc
, plot_loc_facet
, plot_time
have same parameter, so I don't think they can be made into generic. Unless we add additional parameter like this plot(loc_data)
, plot(loc_data, mode = "facet")
, plot(loc_data, mode = "time")
(another option is hist(loc_data)
but the plot is made from ggplot instead of base histogram, I'm not sure if that's OK). Do you like this?summary_model_fit
, this can be made into generic and become summary(model_fit_res)
. Though the behavior of summary on a list of model fit result (a data frame) is different from summary on single model fit result (some summary from ctmm), it could be a little bit confusing. Now it's named summary_tried_models
.list_model_fit
is renamed to flatten_models
. There is a flatten
in rlang
.plot_vario
. If we integrate it into ctmm, what's the best approach?
plot.list
in ctmm just dispatch to plot.variogram
. I think it make sense to direct that to plot.variogram_list
which is just plot_vario
(I'm not sure if plot.variogram.list
will work). The function should not be a part of plot.variogram
because it called plot.variogram
.ctmm
.ctmmweb::plot.variogram_list
. This will introduce ctmmweb
dependencies even plot_vario
don't really need any dependencies.plot(vario_list)
plot_ud
plot each home range separately, while plot(home_range_list)
will plot them together. Making plot_ud
generic will replace that behavior. Is that desired?_
in map functions.par_lapply
cannot be renamed because there is parLapply
in parallel
.calculate_speed
into calc_speed
to make it shorter.resample
is a good name because it means scale down images which is quite similar to our functions. We can make it into resample(buffalo[[1]])
and resample(buffalo)
.
Though this will have same name with raster::resample
, and loading raster
package could override our functions. I'm using pick
now since that kind of remind user to use a point count parameter.I realized if I move some functions to ctmm
and I need to make some changes to them, I'll need the changes to go live in ctmm
for the app to make use of those functions. That means the github master branch of ctmm
must be updated immediately, instead of the original plan of only change ctmm
infrequently in major versions.
One alternative could be make a copy to ctmm
but still keep the local version in app. So my app can still update anytime when needed, and the changes in ctmm
don't need to be as frequent. We can also only export the ctmm
version of functions and hide the ctmmweb
version, so user will only see one version to avoid confusions.
I think we can copy the code to ctmm
after making some necessary changes, and then remove/refactor the corresponding ctmmweb
code after some delay to prevent conflicts.
In addition to having a mc.cores
like argument, maybe the code to calculate a good number of default cores/threads should be in a separate function, to be used like plapply(list,fn,cores=defaultCores())
. In some cases with rapid function calls, I need to limit to 1 cores in Windows because of socket overhead, but I guess I could just stick to mclapply
in those cases instead.
I'm not sure if I understand the suggestion.
Approach 1: Move code to ctmm
Let's say we have decided to move plot_vario
into ctmm
, and ctmmweb
will just call ctmm::plot_vario
. Later I need to make changes to plot_vario
, the changes need to go live in ctmm
master branch so that ctmmweb
can use it properly.
Approach 2: Copy code to ctmm
There will be plot_vario
in both ctmm
and ctmmweb
. The ctmm
one is exported, documented, open to user. The ctmmweb
version is internal used and not exported. Usually they are kept synchronized, but I can make changes to ctmmweb
version frequently, and only update ctmm
version less frequently. Unless there is a serious bug need to be fixed, some delay in ctmm
version should not create problem for user.
Approach 3: ?
I'm not sure I understand it, did you mean the app use ctmm
version when it's up to date, switch to a local version when changes are needed, then switch back to ctmm
version when it's updated again?
In the process of creating interface and renaming/refactoring functions, I found I need to check every usage of changed functions, modify them and test the app thoroughly. That was time consuming and easy to introduce errors can go unfound if not tested.
For parallel function, there are some code detecting platform as windows or linux/mac. If we abstract the core calculating part, both the new function and the original function will need this platform detection code and create a little bit duplication (although not much).
If you want to limit to 1 cores, why don't just use cores = 1
, or even just parallel = FALSE
to avoid the overhead completely? I don't see how abstracting the core calculation function can be useful for this case. I'll just make cores = NULL
as default. Any real input will override it, and no input means using our core calculation.
Sorry, I was just thinking about non-exported functions like plapply.
My plan with the integration:
plot.variogram.list
to our new function plot_vario
(or whatever name you prefer). I found the package installation need to download the whole tar ball now, which include a 10M folder of package website. I tried to exclude that folder, finally got a solution to satisfied multiple requirement from git, github, devtools, which involves creating 3 branches and some jumping hoops inside git.
There is still some problem with git. It often took a lot of time when my requirement in git is a little bit complex...
The parallel function is updated to reflect new changes on cores parameter. The core code is integrated into ctmm but ctmm and ctmmweb will maintain different customized versions since their needs are also different.
The package site is updated with the current version, and I tagged it as v0.1.0
since it's first major version with the package structure.
@jmcalabrese @chfleming It took me a lot of time to adjust the website, update the help documents.
UPDATE: With most functions completed and documented, it's easier to just use the reference website page for current exported functions:
features that not planned to export