KentonWhite / ProjectTemplate

A template utility for R projects that provides a skeletal project.
http://projecttemplate.net
GNU General Public License v3.0
622 stars 159 forks source link

Feature - devtools::use_package and tidyverse style functions #258

Open rsangole opened 6 years ago

rsangole commented 6 years ago

Report an Issue / Request a Feature

I'm submitting a (Check one with "x") :


Issue Severity Classification -

(Check one with "x") :

Expected Behavior

An easy console driven method to quickly add libraries to global.dcf, which would utilize autocomplete for the package names.

Current Behavior

Today, as we work on the projects, adding a library to the global.dcf file is a manual process. When many .R files are open, and the open folder is different, it's a cumbersome workflow to open global.dcf and add the library name without any autocomplete option. (Results in spelling mistakes like I did today - rtsne instead of Rtsne.

Possible Solution

Something along the lines of the devtools::use_package() which can be called from command line as well as will autocomplete package names while typing <read about it>

Example usage: from the R console, call ProjectTemplate::add_package(lattice) changes the global.dcf from this:

...
load_libraries: TRUE
libraries: reshape2, plyr, tidyverse, stringr, lubridate
as_factors: TRUE
...

to this:

...
load_libraries: TRUE
libraries: reshape2, plyr, tidyverse, stringr, lubridate, lattice
as_factors: TRUE
...

If the package already exists in the list, then the user can be notified and no actions taken.

KentonWhite commented 6 years ago

This would be very handy! Please keep in mind the CRAN policy:

Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system’s R installation (e.g., scripts to its bin directory) is not allowed.

Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user.

rsangole commented 6 years ago

@KentonWhite it's interesting they have this policy when tools like devtools write to a file in the user's home filespace, and tools like clipr write to the user's clipboard.

Hugovdberg commented 6 years ago

I think the user issuing this command with the sole intent of writing to the system would fall under the exceptions. It's not like it's a possibly unexpected side effect. This feature needs thorough testing though, we want to very sure we don't loose any settings made by the user. We might even mark it as experimental in the documentation so people will use it with caution. If we don't receive any issues we can remove the message later.

KentonWhite commented 6 years ago

I agree. The simplest way to be compliant is ask the user for confirmation before making the change :)

rsangole commented 6 years ago

Pseudocode:

rsangole commented 6 years ago

To extend this logic and help ease workflow, I can also think of other global.dcf modifiers along the lines of munge_on(), munge_off(), loaddata_on(), loaddata_off(), loadcache_on(), loadcache_off() etc. You get the idea.

KentonWhite commented 6 years ago

Rather than having specific functions for each, what if was a general configuration function:

config.settings(c(setting = value), config_file = 'global.dcf')
rsangole commented 6 years ago

@KentonWhite I wanted to make separate pipable verbs which will work well in the tidyverse work flow sense. These will be more human readable and quicker to type in the console. For example:

# I make some changes in the underlying input data stored in `data/` .
# I need to quickly reload data, perform all my munging activities. 

loaddata_on() %>%
   munge_on() %>%
   loadcache_off() %>%
   reload_project()
# I make some changes in the underlying input data stored in `data/` .
# I need to quickly reload data, perform all my munging activities. 
# But, for subsequent project loads, I only want to use the newly cached files.

loaddata_on() %>%
   munge_on() %>%
   loadcache_off() %>%
   reload_project() %>%
   loaddata_off() %>% munge_off() %>% loadcache_on()
# I add some libraries to my config file. I want to reload all my libraries to ensure 
# they're in my working environment.

add_package(lattice) %>%
   add_package(TSClust) %>%
   reload_libraries()
# I did something odd. I need to start a fresh from my known-good cache files.

munge_off() %>%
   loadcache_on() %>%
   reload_project()
Hugovdberg commented 6 years ago

The pipe operator takes the output of the lefthand side and inserts it as the first argument to the righthand side, what do you plan to pass between the functions? And if you don't pass anything, why bother piping? Not typing the pipe is even shorter ;-)

Also, if you just once need to override a setting then reload.project(munging = FALSE) isn't that much more work is it? Can't we add the most used settings as named arguments to load.project and reload.project so you can have autocomplete, but without adding a lot of functions to toggle the setting.

I think the original add_package function to add a library permanently to the list of packages is fine. But the later examples seem to imply a temporary change of the settings, which is already implemented without adding a long list of functions to the interface. If we want to provide a function to change the settings permanently then I agree with @KentonWhite that an add_package and/or perhaps a more general function update.config(..., config = 'global.dcf') or config.settings(..., config = 'global.dcf') would be best.

rsangole commented 6 years ago

In my 2nd example:

loaddata_on() %>%
   munge_on() %>%
   loadcache_off() %>%
   reload_project() %>%
   loaddata_off() %>% munge_off() %>% loadcache_on()

I want to be able to take actions on the dcf file after running reload_project(). Thus, I envisioned this type of piping to enable lazy evaluation. The piping is to build a sequence of actions desired; each pipe will enable a build up for the sequence. At the very end of the pipe, the sequence will be executed.

I also think the pipe makes code much more readable than operators, but that's just my personal preference.

I agree with the statements about the temporary change (for 1 runtime) vs permanent change. Summarizing our discussion so far, we could have the following:

Hugovdberg commented 6 years ago

But how would the last function in the chain know it is the last function? Put differently, if every function passes the full config, how would each function know whether to write the config to disk or not? The pipe operator is just a nice syntactic sugar for this:

loadcache_on(
  munge_off(
    loaddata_off(
      reload_project(
        loadcache_off(
          munge_on(
            loaddata_on()
          )
        )
      )
    )
  )
)

There is nothing magical going on to pass knowledge about functions inside or outside the current function.

I think a better approach for your that example would be like this:

# Set config permanently to use cache only
update.config(
    data.loading = FALSE,
    munging = FALSE,
    cache.loading = TRUE
)
# Reload once from data
reload.project(
    data.loading = TRUE, 
    munging = TRUE, 
    cache.loading = FALSE
)
Hugovdberg commented 6 years ago

It's not to say I don't like the idea, but the added value of the piping construction is not clear to me. But if you have an elegant solution then I'm open for it :-) I realise I can be a bit stubborn and strongly opinionated at times ;-)

Hugovdberg commented 6 years ago

Also, we should add a template argument to the update.config function, which defaults to NULL for the current project, so you can edit the config in one of your templates so all future projects from that template use the new settings.

rsangole commented 6 years ago

@Hugovdberg , I always up for a debate over ideas; if my ideas can pass the muster of smarter folks like you two, then I know it's a good idea. Keep it coming!

how would each function know whether to write the config to disk or not?

I don't know, tbh. In my head, things are still at a conceptual level - I haven't put thought into the actual code structure yet. While I truly (deeply) love pipes, I can agree that your proposal of:

# Set config permanently to use cache only
update.config(
    data.loading = FALSE,
    munging = FALSE,
    cache.loading = TRUE,
    template = NULL,
    ...
)
# Reload once from data
reload.project(
    data.loading = TRUE, 
    munging = TRUE, 
    cache.loading = FALSE,
    ...
)

achieves the intent of my needs. Let's go with this, along with anadd.package(). (Instead of add_package(), since we're using dots in this package.)