ctmm-initiative / ctmmweb

Web app for analyzing animal tracking data, built upon ctmm R package
http://biology.umd.edu/movement.html
GNU General Public License v3.0
34 stars 22 forks source link

Interface of exported functions in package #41

Closed xhdong-umd closed 6 years ago

xhdong-umd commented 6 years ago

@jmcalabrese @chfleming It took me a lot of time to adjust the website, update the help documents.

UPDATE: With most functions completed and documented, it's easier to just use the reference website page for current exported functions:

screen shot 2017-12-12 at 3 56 44 pm

features that not planned to export

xhdong-umd commented 6 years ago

I implemented plot_loc and found a serious bug. After some plot in RStudio, the shiny app will have all the plots shown in RStudio plot windows instead of app itself.

It only appears when app is saving plot into .png. This should not happen, I'm not sure if it's problem with ggsave or shiny or RStudio. I have asked question in forums, hopefully can be fixed quickly.

xhdong-umd commented 6 years ago

I found a workaround to avoid the problem. The problem is caused by ggsave and Shiny.

xhdong-umd commented 6 years ago

plot functions in visualization page are finished, with help updated and linked in first post.

xhdong-umd commented 6 years ago

plot for variograms is done.

xhdong-umd commented 6 years ago

I moved individual code examples in each function into a vignette, since most of them are part of a workflow, and it's easier to explain them under one document.

xhdong-umd commented 6 years ago

plot for home range and occurence are done.

Though it will take quite some efforts to write example code for this part, since I need to duplicate the in app process of maintaining model list and names etc with a different workflow as simple as possible.

xhdong-umd commented 6 years ago

I have finished on map page, and updated the repo and reference website. With almost all functions finished and documented, I updated the first post with the reference website page. You can click the reference page to look at help on every function.

I decided to move code examples into a single vignette so there is no example in help now. It will take some time to write the vignette, because I need to reproduce a lot of user selections and a lot of operations in app which are not exported in package yet (and probably need a different usage).

xhdong-umd commented 6 years ago

@jmcalabrese @chfleming I just finished the main vignette. To establish a workflow with package functions, I have to create quite some new functions for user's convenience, which duplicate some app behavior in different ways.

You can have a look at the vignette, hopefully it's not hard to read.

I have updated the repo, the package website, and set the package version to 0.1.0.

xhdong-umd commented 6 years ago

I just updated the hosted app to match the package version.

Next I'll look at the map animation possibilities before our next meeting.

jmcalabrese commented 6 years ago

@xhdong-umd: Thanks for putting the package vignette together. Some comments and edits to the vignette are in the attached .pdf. Let me know if you have question or want to discuss any of it. Package Usage • ctmmweb_jmc.pdf

xhdong-umd commented 6 years ago

@jmcalabrese Thanks for the edits, the vignettes is in a much better shape now. I probably should take more time to review and edit it again, though it was hard to resist the urge of finishing it before the holiday vacation starts :sweat_smile:.

Here is a diff of the RMarkdown. You can choose among source diff mode, rich diff mode or view the file directly.

For some of your comments:

xhdong-umd commented 6 years ago

I have added a reserved_cores parameter to the parallel functions.

Though one tricky point in parallel core setting is that there are physical cores and logical cores. A windows pc with 4 physical cores often have 8 logical cores via hyperthread. The usual advice is to use physical cores - 1, however I found it's better to use cores = list length if the number is not too big. Because running 8 jobs on 8 threads is always faster than 8 jobs on 3 threads, even if you only have 4 cores, since the jobs are much more balanced with more threads.

To make sure some physical cores are not used I cannot use logical core number in this mode. So it's possible the default mode used 8 logical cores with 4 physical cores machine, but with 1 reserved core it can only use 3 cores.

chfleming commented 6 years ago

Hi @xhdong-umd , nice work. I have some minor suggestions. @jmcalabrese may feel free to chime in.

  1. There are some methods like plot_vario and plot_ud that have similar functionality but act on different class objects. I think the R way of doing this would be to define a generic and then define different dispatch methods. For lists of objects, which could be a list of variograms or a list of UDs, I create a list dispatch method to handle the final dispatch (see plot.list in generic.R of ctmm).
  2. For some methods like plot_vario that add bells and whistles to base ctmm functions (plot.variogram) and take the same objects as inputs, they could possibly be integrated into base ctmm functions---specifically if plot_vario is just using par(mfrow=...) and title, then we could add the mfrow argument to plot.variogram that would plot lists of variograms in a grid instead of ontop of each other with their names as titles.
  3. Following on (1), I think any class specific function like data_sample would be better as a class dispatch method for a generic, even if that's the only method with that functionality. In the case of data_sample, the logical thing would be making a sample.telemetry except that base sample does not have a ... argument allowing it to be properly overloaded (and sample.list already exists with different functionality). But you could make a differently named generic, like resample or something.
  4. Following on (1) & (3), I think most of the C-like underscores _ in the function names could be gotten rid of in favor of R-like functions with parsimoniously named generics that are overloaded for different object classes.
xhdong-umd commented 6 years ago

@chfleming I'll try to make some methods to generic. Some comments:

  1. Some function will have a generic name but additional parameters specific to itself. So user can use a generic name with default parameters which is very familiar. However if they want more control, they need to read help and use specific parameters. And user need to know to look for ?plot.vario for help when they are using plot(vario_list). Though I think there is no way to work around this...
# default usage
plot(vario_list)

# more parameters
plot(vario_list, model_list = NULL,
                       fraction = 0.5, relative_zoom = TRUE, cex = 0.65,
                       model_color = "blue", columns = 2)

# for home range
plot(UD_list)

# more parameters
plot(UD_list, level_vec = 0.95, columns = 2, cex = 0.65,
                    tele_list)
  1. plot_vario is taking a list of variograms, and the names of list items should have some information in empirical variogram mode. I have no problem to integrate it into base ctmm, though I'm not sure if that will bring too much additional requirements (see the help page of plot_vario), since there are 3 modes depend on the model_list parameter.
  2. For the _ in names. I saw extensive _ usages in some of R packages (like stringr and some of Hadley packages), so I adopted this style after comparing the other options. We could remove some of them with generic names, but there probably will be quite some left.
chfleming commented 6 years ago

@xhdong-umd

  1. Yes, you make the generic as generic(universal_args,...) and then the class specific methods as generic.class(universal_args,class_args,...). The generic arguments need to be arguments that every method will always have. Unfortunately, there are many R functions that take up good names but don't make good generics because of their argument structure. E.g., sample. I have an example S3 generic emulate if you want to see. In 1.R I have the S3 generic defined (I keep this stuff separate in the first alphabetical file due to peculiarities with S4---some generics that I import are S4 generics). Then in emulate.R I define the class methods. In NAMESPACE I export the generic and the S3 methods. In man/emulate.Rd I document the generic and S3 methods inline with CRAN standards. S4 is a bit different and I actively avoid it.
  2. That's fine.
  3. Its a matter of taste. Personally, I think its good for internal code, but bad for the end user to see.
xhdong-umd commented 6 years ago

@chfleming I updated some dt parameters to loc_data to avoid the naming conflict mentioned by @jmcalabrese .

For the functions can be made generic, I went through the exported functions:

xhdong-umd commented 6 years ago

I realized if I move some functions to ctmm and I need to make some changes to them, I'll need the changes to go live in ctmm for the app to make use of those functions. That means the github master branch of ctmm must be updated immediately, instead of the original plan of only change ctmm infrequently in major versions.

One alternative could be make a copy to ctmm but still keep the local version in app. So my app can still update anytime when needed, and the changes in ctmm don't need to be as frequent. We can also only export the ctmm version of functions and hide the ctmmweb version, so user will only see one version to avoid confusions.

chfleming commented 6 years ago

I think we can copy the code to ctmm after making some necessary changes, and then remove/refactor the corresponding ctmmweb code after some delay to prevent conflicts.

In addition to having a mc.cores like argument, maybe the code to calculate a good number of default cores/threads should be in a separate function, to be used like plapply(list,fn,cores=defaultCores()). In some cases with rapid function calls, I need to limit to 1 cores in Windows because of socket overhead, but I guess I could just stick to mclapply in those cases instead.

xhdong-umd commented 6 years ago

I'm not sure if I understand the suggestion.

In the process of creating interface and renaming/refactoring functions, I found I need to check every usage of changed functions, modify them and test the app thoroughly. That was time consuming and easy to introduce errors can go unfound if not tested.

For parallel function, there are some code detecting platform as windows or linux/mac. If we abstract the core calculating part, both the new function and the original function will need this platform detection code and create a little bit duplication (although not much).

If you want to limit to 1 cores, why don't just use cores = 1, or even just parallel = FALSE to avoid the overhead completely? I don't see how abstracting the core calculation function can be useful for this case. I'll just make cores = NULL as default. Any real input will override it, and no input means using our core calculation.

chfleming commented 6 years ago

Sorry, I was just thinking about non-exported functions like plapply.

xhdong-umd commented 6 years ago

My plan with the integration:

xhdong-umd commented 6 years ago

I found the package installation need to download the whole tar ball now, which include a 10M folder of package website. I tried to exclude that folder, finally got a solution to satisfied multiple requirement from git, github, devtools, which involves creating 3 branches and some jumping hoops inside git.

xhdong-umd commented 6 years ago

There is still some problem with git. It often took a lot of time when my requirement in git is a little bit complex...

xhdong-umd commented 6 years ago

The parallel function is updated to reflect new changes on cores parameter. The core code is integrated into ctmm but ctmm and ctmmweb will maintain different customized versions since their needs are also different.

The package site is updated with the current version, and I tagged it as v0.1.0 since it's first major version with the package structure.