Created attachment 2448 [details]
Forwarding index tags
Please find attached four patches. The first three improve the
consistency of printing and dispatch to print methods. They fix bugs
and make printing and dispatch more predictable for end users.
The fourth patch proposes a new mechanism for customising printing at
the console. It does so in a way that print() calls and autoprint
are consistent. At the same time, packages are not affected by the
customisation. Patch 4 requires patches 1 and 3.
Patch 1: Forwarding index tags
==============================
This patch materialises the index tag buffer tagbuf into an
indexTag argument that is passed to the R level print(). The
default print method then passes it back to the C routine. This fixes
inconsistencies when printing recursive data structures:
This change makes the output more consistent and predictable with S3
objects that don't implement print methods. It has positive effects in
a few cases covered by make check. For instance the output of
postscriptFonts():
### Before
logical(0)
attr(,"na.action")
[1] 1
attr(,"class")
[1] "omit"
### After
logical(0)
attr(,"na.action")
[1] 1
attr(,"na.action")attr(,"class")
[1] "omit"
If the S3 list does have a print method but calls NextMethod() or
print.default() (in the latter case ... should be passed on), the
index tags are now properly written as well:
Unfortunately, passing the indexTag on recursion has implications
for backward compatibility. Breakages should be limited because print
methods are required to add ... in their signature for consistency.
If they do, the indexTag argument passed on recursion should be
swallowed in the dots. However, it can cause issues when ... is
passed on to another function that doesn't take dots itself. A few
places needed changes to avoid this issue:
print.Dlist() now explicitly takes the arguments it passes on to
formatDL(), instead of passing dots.
print.DLLInfo() now explictly takes arguments passed on to
write.dcf().
print.Bibtex() and print.Latex() now explicitly take arguments
passed on to writeLines().
In general, it seems that passing ... to other functions than
print() should be considered an anti-pattern. Since print() is a
recursive generic that dispatches to potentially heterogeneous
methods (including the default method), no assumptions can safely be
made about the contents of .... For instance, if x is a list that
contains both data frames and factors, print(x, row.names = FALSE, max.levels = 3) will pass all arguments to both print methods, which
should ignore the ones they don't know about. Given this, it would be
good idea to push for stricter uses of ... in print methods.
Patch 2: Consistency of dispatch to base types
==============================================
This patch is a follow-up to r76565
(https://github.com/wch/r-source/commit/e06aa530). It enhances the
consistency between print and autoprint by calling print() whenever
a method is defined for an object, even when OBJECT is unset. With
this second change, dispatch consistently reaches S3 methods for base
types. Previously, it would only reach print.function() because of
special-casing in the dispatch code:
print.list <- function(x, ...) {
str(x, max.level = 2)
}
### Before
list(list(1))
#> [[1]]
#> [[1]][[1]]
#> [1] 1
### After
list(list(1))
#> List of 1
#> $ :List of 1
#> ..$ : num 1
Since we only call the R level print() when a method is defined,
autoprint doesn't bump the namedness of objects unless it is
necessary.
Patch 3: Consistency of dispatch context
========================================
This patch is another attempt at getting print() to forward the
original calling environment when recursing back to R. This patch was
already proposed in PR 17398, and rejected at the time. I will try to
better explain the motivation behind it.
Currently, autoprint passes the current environment (an execution
environment if called from the stack browser, or the global env if
called from the console) to the native print routine. However, the R
function print() passes its own execution environment rather than the
environment of its caller. Since environment inherits from the base
namespace, this prevents the recursive case of print() from
dispatching to user-defined methods when the methods are also defined
in base:
print.function <- function(x, ...) {
cat("<FN-DISPATCHED>\n")
NextMethod()
}
### Correct before and after
identity
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>
print(identity)
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>
list(identity)
#> [[1]]
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>
### Before
print(list(identity))
#> [[1]]
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>
### After
print(list(identity))
#> [[1]]
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7fdfa20e9e00>
#> <environment: namespace:base>
# Similarly for data frames:
print.data.frame <- function(x, ...) {
cat("<DF-DISPATCHED>\n")
base::print.data.frame(x, ...)
}
### Before
print(list(data.frame(x = 1)))
#> [[1]]
#> x
#> 1 1
### After
print(list(data.frame(x = 1)))
#> [[1]]
#> <DF-DISPATCHED>
#> x
#> 1 1
Note that if a print method is defined, and this method calls
print.default() again (possibly via NextMethod()), this forces a
recursion through print(). In this case, auto-printing lists used to
disrupt dispatch, which is also fixed by patch 3:
In essence, this patch ensures lexical scoping of method dispatch by
properly forwarding the dispatch context throughout recursion.
Patch 4: User customisation for printing
========================================
Defining print() methods in the global environment is generally not
appropriate for foreign and base types because global methods have
global effect, including in package functions calling print() inside
larger print methods. It is common for print methods in packages to
arrange data as a base type like a matrix and call print() again on
the result (one example of this is in print.data.frame). Overriding
methods globally causes such unpredictability in print outputs that it
can't be considered as a proper way of customising printing for
day-to-day analysis and developing.
Yet customising how objects are printed at the console would be very
useful for R users. They could elect to print some objects to display
more information, for instance numeric vectors could be displayed with
statistical summaries, or environments could be printed with their
bindings and their parent. Printing objects in the console is the main
way to interact with R and making it easy to customise printing would
be handy for data analysis and for debugging.
Patch 4 proposes and implements a global option autoprint for
customising printing. This option should be set to a function taking
x and ..., like a print method. Because we allow arbitrary
autoprint functions, this global option is very flexible, an can use
either structured or unstructured dispatch. For instance it could be
set to a generic for which methods could be implemented, or to a
function that does ad hoc dispatch. It is also very easy to change the
custom autoprint function depending on the task at hand. The user can
maintain their own autoprint function for debugging, and then switch
to another autoprint function exported by a package for data analysis.
Here is an example of custom autoprint function implementing ad hoc
dispatch. It is not meant to be state of the art or even usable, but
just illustrate the sort of things that can be accomplished:
myprint <- function(x, ...) {
if (!is.object(x) && is.list(x)) {
str(x, max.levels = 3)
return(invisible(x))
}
if (is.factor(x)) {
cat("<factor>\n")
print(table(x, dnn = NULL), ...)
return(invisible(x))
}
switch(typeof(x),
language = {
print(lobstr::ast(!!x), ...)
return(invisible(x))
},
environment = {
rlang::env_print(x)
return(invisible(x))
}
)
if (is.data.frame(x))
x <- data.table::as.data.table(x)
# Fall back to print
print(x, ...)
}
options(autoprint = myprint)
# Lists are now printed with str()
list(list(1, list(3)))
#> List of 1
#> $ :List of 2
#> ..$ : num 1
#> ..$ :List of 1
#> .. ..$ : num 3
# Factors are printed as counts
iris$Species
#> <factor>
#> setosa versicolor virginica
#> 50 50 50
# Language objects are displayed with their tree structure
foo∼ bar() + 1
#> █─∼`
#> ├─foo
#> └─█─`+`
#> ├─█─bar
#> └─1
# Environments are displayed with their contents
asNamespace("stats")
#> <environment: namespace:stats> [L]
#> Parent: <environment: imports:stats>
#> Bindings:
#> * princomp.default: <lazy> [L]
#> * formula.glm: <lazy> [L]
#> * drop1.glm: <lazy> [L]
#> * print.tskernel: <lazy> [L]
#> * labels.lm: <lazy> [L]
#> * binomial: <lazy> [L]
#> * contr.treatment: <lazy> [L]
#> * extractAIC: <fn> [L]
#> * anova.glm: <lazy> [L]
#> ... with 1122 more bindings
# Data frames are printed with the data.table method
iris
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1: 5.1 3.5 1.4 0.2 setosa
#> 2: 4.9 3.0 1.4 0.2 setosa
#> 3: 4.7 3.2 1.3 0.2 setosa
#> 4: 4.6 3.1 1.5 0.2 setosa
#> 5: 5.0 3.6 1.4 0.2 setosa
#> ---
#> 146: 6.7 3.0 5.2 2.3 virginica
#> 147: 6.3 2.5 5.0 1.9 virginica
#> 148: 6.5 3.0 5.2 2.0 virginica
#> 149: 6.2 3.4 5.4 2.3 virginica
#> 150: 5.9 3.0 5.1 1.8 virginica
# Other objects are printed normally via the print() fallback
letters
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
#> [19] "s" "t" "u" "v" "w" "x" "y" "z"
Here are the main constraints at play for this customisation mechanism:
Autoprint and print() calls from user scripts should be consistent
(as documented since S).
print() calls from packages should not use the user
customisation. They should be shielded so their output is
predictable.
To keep the interface simple, the customisation mechanism should be
able to fall back to regular printing by simply calling print().
The customisation function should be turned off for this invokation
of print (to avoid infinite recursion), but turned on again in the
recursive case, when printing elements of lists.
The combination of constraints 1 and 2 is the main complicating
factor, but they are important to make printing consistent for the
user. Printing an object at the console or via print-debugging,
including inside functions, should produce the same output. Solving
all 3 constraints simulteously was challenging but I think the
proposed patch provides a relatively simple solution that should work
robustly and predictably.
Autoprint always uses the autoprint global option.
When print() is called, the topenv of the caller is inspected. If it
inherits from the global environment, the autoprint option is
enabled. Otherwise, it is ignored. This way calling print() from the
global environment or from functions defined in the global
environment invokes getOption("autoprint") if it set. On the other
hand, the topenv of the caller is the package namespace when called
from a package. This ensures the customisation is ignored in packages.
Note that this heuristic requires patch 3 to keep the caller
environment stable in the recursive case and get consistent output
when printing list elements.
One issue is that if a package print method calls NextMethod(),
the caller might be artificially forwarded to print.default(), and
we might uncorrectly detect a global topenv. This is worked around
by examining the .Class pronoun to detect NextMethod() invokations.
Falling back to print() from the custom function should keep the
customisation enabled in case we recurse into print() for list
elements. This is a bit tricky because the caller might not inherit
from the global environment if the customisation function is
exported from a package. To work around this, we inspect the call
stack from print.default() to determine whether we're being called
from the very first print() call after the custom autoprint function
was invoked. This way we reliably distinguish direct fallbacks from
other cases.
Falling back to print() should temporarily disable custom autoprint
to prevent infinite recursion. Custom autoprint should still be
invoked later on, when printing list elements. We achieve this by
setting a global flag from print() which we reset in
print.default(). The global flag is set to the frame number at the
time the custom function is invoked, to help implementing the
heuristic described in the previous bullet point.
Setting the autoprint function does mean that we invoke an R function
for all objects printed, as well as for all elements of lists in the
fallback case. Consequently autoprinting always bumps the namedness of
objects when the autoprint option is set, which will cause
unnecessary duplications. Also, printing lists would behave very
erratically without patch 1 because the index tags wouldn't be
forwarded back to the native routine.
Created attachment 2448 [details] Forwarding index tags
Please find attached four patches. The first three improve the consistency of printing and dispatch to print methods. They fix bugs and make printing and dispatch more predictable for end users.
The fourth patch proposes a new mechanism for customising printing at the console. It does so in a way that
print()
calls and autoprint are consistent. At the same time, packages are not affected by the customisation. Patch 4 requires patches 1 and 3.Patch 1: Forwarding index tags ==============================
This patch materialises the index tag buffer
tagbuf
into anindexTag
argument that is passed to the R levelprint()
. The default print method then passes it back to the C routine. This fixes inconsistencies when printing recursive data structures:This change makes the output more consistent and predictable with S3 objects that don't implement print methods. It has positive effects in a few cases covered by
make check
. For instance the output ofpostscriptFonts()
:Or the output of
na.omit(NA)
:If the S3 list does have a print method but calls
NextMethod()
orprint.default()
(in the latter case...
should be passed on), the index tags are now properly written as well:In addition to fixing inconsistencies, the
indexTag
argument may also be useful to print a data structure with a custom index tag:Or it could be used in boxed or proxy classes to print actual data with the proper index tags:
Unfortunately, passing the
indexTag
on recursion has implications for backward compatibility. Breakages should be limited because print methods are required to add...
in their signature for consistency. If they do, theindexTag
argument passed on recursion should be swallowed in the dots. However, it can cause issues when...
is passed on to another function that doesn't take dots itself. A few places needed changes to avoid this issue:print.Dlist()
now explicitly takes the arguments it passes on toformatDL()
, instead of passing dots.print.DLLInfo()
now explictly takes arguments passed on towrite.dcf()
.print.Bibtex()
andprint.Latex()
now explicitly take arguments passed on towriteLines()
.In general, it seems that passing
...
to other functions thanprint()
should be considered an anti-pattern. Sinceprint()
is a recursive generic that dispatches to potentially heterogeneous methods (including the default method), no assumptions can safely be made about the contents of...
. For instance, ifx
is a list that contains both data frames and factors,print(x, row.names = FALSE, max.levels = 3)
will pass all arguments to both print methods, which should ignore the ones they don't know about. Given this, it would be good idea to push for stricter uses of...
in print methods.Patch 2: Consistency of dispatch to base types ==============================================
This patch is a follow-up to r76565 (https://github.com/wch/r-source/commit/e06aa530). It enhances the consistency between print and autoprint by calling
print()
whenever a method is defined for an object, even when OBJECT is unset. With this second change, dispatch consistently reaches S3 methods for base types. Previously, it would only reachprint.function()
because of special-casing in the dispatch code:Since we only call the R level
print()
when a method is defined, autoprint doesn't bump the namedness of objects unless it is necessary.Patch 3: Consistency of dispatch context ========================================
This patch is another attempt at getting
print()
to forward the original calling environment when recursing back to R. This patch was already proposed in PR 17398, and rejected at the time. I will try to better explain the motivation behind it.Currently, autoprint passes the current environment (an execution environment if called from the stack browser, or the global env if called from the console) to the native print routine. However, the R function print() passes its own execution environment rather than the environment of its caller. Since environment inherits from the base namespace, this prevents the recursive case of print() from dispatching to user-defined methods when the methods are also defined in base:
Note that if a print method is defined, and this method calls
print.default()
again (possibly viaNextMethod()
), this forces a recursion through print(). In this case, auto-printing lists used to disrupt dispatch, which is also fixed by patch 3:In essence, this patch ensures lexical scoping of method dispatch by properly forwarding the dispatch context throughout recursion.
Patch 4: User customisation for printing ========================================
Defining
print()
methods in the global environment is generally not appropriate for foreign and base types because global methods have global effect, including in package functions callingprint()
inside larger print methods. It is common for print methods in packages to arrange data as a base type like a matrix and call print() again on the result (one example of this is in print.data.frame). Overriding methods globally causes such unpredictability in print outputs that it can't be considered as a proper way of customising printing for day-to-day analysis and developing.Yet customising how objects are printed at the console would be very useful for R users. They could elect to print some objects to display more information, for instance numeric vectors could be displayed with statistical summaries, or environments could be printed with their bindings and their parent. Printing objects in the console is the main way to interact with R and making it easy to customise printing would be handy for data analysis and for debugging.
Patch 4 proposes and implements a global option
autoprint
for customising printing. This option should be set to a function takingx
and...
, like a print method. Because we allow arbitrary autoprint functions, this global option is very flexible, an can use either structured or unstructured dispatch. For instance it could be set to a generic for which methods could be implemented, or to a function that does ad hoc dispatch. It is also very easy to change the custom autoprint function depending on the task at hand. The user can maintain their own autoprint function for debugging, and then switch to another autoprint function exported by a package for data analysis.Here is an example of custom autoprint function implementing ad hoc dispatch. It is not meant to be state of the art or even usable, but just illustrate the sort of things that can be accomplished:
Here are the main constraints at play for this customisation mechanism:
Autoprint and print() calls from user scripts should be consistent (as documented since S).
print() calls from packages should not use the user customisation. They should be shielded so their output is predictable.
To keep the interface simple, the customisation mechanism should be able to fall back to regular printing by simply calling
print()
. The customisation function should be turned off for this invokation of print (to avoid infinite recursion), but turned on again in the recursive case, when printing elements of lists.The combination of constraints 1 and 2 is the main complicating factor, but they are important to make printing consistent for the user. Printing an object at the console or via print-debugging, including inside functions, should produce the same output. Solving all 3 constraints simulteously was challenging but I think the proposed patch provides a relatively simple solution that should work robustly and predictably.
Autoprint always uses the
autoprint
global option.When print() is called, the topenv of the caller is inspected. If it inherits from the global environment, the autoprint option is enabled. Otherwise, it is ignored. This way calling print() from the global environment or from functions defined in the global environment invokes
getOption("autoprint")
if it set. On the other hand, the topenv of the caller is the package namespace when called from a package. This ensures the customisation is ignored in packages.Note that this heuristic requires patch 3 to keep the caller environment stable in the recursive case and get consistent output when printing list elements.
One issue is that if a package print method calls NextMethod(), the caller might be artificially forwarded to
print.default()
, and we might uncorrectly detect a global topenv. This is worked around by examining the.Class
pronoun to detect NextMethod() invokations.Falling back to print() from the custom function should keep the customisation enabled in case we recurse into print() for list elements. This is a bit tricky because the caller might not inherit from the global environment if the customisation function is exported from a package. To work around this, we inspect the call stack from print.default() to determine whether we're being called from the very first print() call after the custom autoprint function was invoked. This way we reliably distinguish direct fallbacks from other cases.
Falling back to print() should temporarily disable custom autoprint to prevent infinite recursion. Custom autoprint should still be invoked later on, when printing list elements. We achieve this by setting a global flag from print() which we reset in print.default(). The global flag is set to the frame number at the time the custom function is invoked, to help implementing the heuristic described in the previous bullet point.
Setting the autoprint function does mean that we invoke an R function for all objects printed, as well as for all elements of lists in the fallback case. Consequently autoprinting always bumps the namedness of objects when the
autoprint
option is set, which will cause unnecessary duplications. Also, printing lists would behave very erratically without patch 1 because the index tags wouldn't be forwarded back to the native routine.METADATA