Open rsangole opened 6 years ago
This would be very handy! Please keep in mind the CRAN policy:
Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system’s R installation (e.g., scripts to its bin directory) is not allowed.
Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user.
@KentonWhite it's interesting they have this policy when tools like devtools
write to a file in the user's home filespace, and tools like clipr
write to the user's clipboard.
I think the user issuing this command with the sole intent of writing to the system would fall under the exceptions. It's not like it's a possibly unexpected side effect. This feature needs thorough testing though, we want to very sure we don't loose any settings made by the user. We might even mark it as experimental in the documentation so people will use it with caution. If we don't receive any issues we can remove the message later.
I agree. The simplest way to be compliant is ask the user for confirmation before making the change :)
Pseudocode:
add_package()
global.dcf
global.dcf
verbose = TRUE
, print new global.dcf
to console for visual confirmation of changeTo extend this logic and help ease workflow, I can also think of other global.dcf
modifiers along the lines of munge_on()
, munge_off()
, loaddata_on()
, loaddata_off()
, loadcache_on()
, loadcache_off()
etc. You get the idea.
Rather than having specific functions for each, what if was a general configuration function:
config.settings(c(setting = value), config_file = 'global.dcf')
@KentonWhite I wanted to make separate pipable verbs which will work well in the tidyverse
work flow sense. These will be more human readable and quicker to type in the console. For example:
# I make some changes in the underlying input data stored in `data/` .
# I need to quickly reload data, perform all my munging activities.
loaddata_on() %>%
munge_on() %>%
loadcache_off() %>%
reload_project()
# I make some changes in the underlying input data stored in `data/` .
# I need to quickly reload data, perform all my munging activities.
# But, for subsequent project loads, I only want to use the newly cached files.
loaddata_on() %>%
munge_on() %>%
loadcache_off() %>%
reload_project() %>%
loaddata_off() %>% munge_off() %>% loadcache_on()
# I add some libraries to my config file. I want to reload all my libraries to ensure
# they're in my working environment.
add_package(lattice) %>%
add_package(TSClust) %>%
reload_libraries()
# I did something odd. I need to start a fresh from my known-good cache files.
munge_off() %>%
loadcache_on() %>%
reload_project()
The pipe operator takes the output of the lefthand side and inserts it as the first argument to the righthand side, what do you plan to pass between the functions? And if you don't pass anything, why bother piping? Not typing the pipe is even shorter ;-)
Also, if you just once need to override a setting then reload.project(munging = FALSE)
isn't that much more work is it? Can't we add the most used settings as named arguments to load.project
and reload.project
so you can have autocomplete, but without adding a lot of functions to toggle the setting.
I think the original add_package
function to add a library permanently to the list of packages is fine. But the later examples seem to imply a temporary change of the settings, which is already implemented without adding a long list of functions to the interface. If we want to provide a function to change the settings permanently then I agree with @KentonWhite that an add_package
and/or perhaps a more general function update.config(..., config = 'global.dcf')
or config.settings(..., config = 'global.dcf')
would be best.
In my 2nd example:
loaddata_on() %>%
munge_on() %>%
loadcache_off() %>%
reload_project() %>%
loaddata_off() %>% munge_off() %>% loadcache_on()
I want to be able to take actions on the dcf
file after running reload_project()
. Thus, I envisioned this type of piping to enable lazy evaluation. The piping is to build a sequence of actions desired; each pipe will enable a build up for the sequence. At the very end of the pipe, the sequence will be executed.
I also think the pipe makes code much more readable than operators, but that's just my personal preference.
I agree with the statements about the temporary change (for 1 runtime) vs permanent change. Summarizing our discussion so far, we could have the following:
add_package()
to add a new package to dcf
load.project(...)
and reload.project(...)
have auto-complete arguments to change popular configs temporarily for 1 execution.update.config(..., config = 'global.dcf')
to allow change in config file from the consoleBut how would the last function in the chain know it is the last function? Put differently, if every function passes the full config, how would each function know whether to write the config to disk or not? The pipe operator is just a nice syntactic sugar for this:
loadcache_on(
munge_off(
loaddata_off(
reload_project(
loadcache_off(
munge_on(
loaddata_on()
)
)
)
)
)
)
There is nothing magical going on to pass knowledge about functions inside or outside the current function.
I think a better approach for your that example would be like this:
# Set config permanently to use cache only
update.config(
data.loading = FALSE,
munging = FALSE,
cache.loading = TRUE
)
# Reload once from data
reload.project(
data.loading = TRUE,
munging = TRUE,
cache.loading = FALSE
)
It's not to say I don't like the idea, but the added value of the piping construction is not clear to me. But if you have an elegant solution then I'm open for it :-) I realise I can be a bit stubborn and strongly opinionated at times ;-)
Also, we should add a template
argument to the update.config
function, which defaults to NULL
for the current project, so you can edit the config in one of your templates so all future projects from that template use the new settings.
@Hugovdberg , I always up for a debate over ideas; if my ideas can pass the muster of smarter folks like you two, then I know it's a good idea. Keep it coming!
how would each function know whether to write the config to disk or not?
I don't know, tbh. In my head, things are still at a conceptual level - I haven't put thought into the actual code structure yet. While I truly (deeply) love pipes, I can agree that your proposal of:
# Set config permanently to use cache only
update.config(
data.loading = FALSE,
munging = FALSE,
cache.loading = TRUE,
template = NULL,
...
)
# Reload once from data
reload.project(
data.loading = TRUE,
munging = TRUE,
cache.loading = FALSE,
...
)
achieves the intent of my needs. Let's go with this, along with anadd.package()
. (Instead of add_package()
, since we're using dots in this package.)
Report an Issue / Request a Feature
I'm submitting a (Check one with "x") :
Issue Severity Classification -
(Check one with "x") :
Expected Behavior
An easy console driven method to quickly add libraries to
global.dcf
, which would utilize autocomplete for the package names.Current Behavior
Today, as we work on the projects, adding a library to the
global.dcf
file is a manual process. When many.R
files are open, and the open folder is different, it's a cumbersome workflow to openglobal.dcf
and add the library name without any autocomplete option. (Results in spelling mistakes like I did today -rtsne
instead ofRtsne
.Possible Solution
Something along the lines of the
devtools::use_package()
which can be called from command line as well as will autocomplete package names while typing <read about it>Example usage: from the R console, call
ProjectTemplate::add_package(lattice)
changes theglobal.dcf
from this:to this:
If the package already exists in the list, then the user can be notified and no actions taken.