MichaelChirico / r-bugs

A ⚠️read-only⚠️mirror of https://bugs.r-project.org/
20 stars 0 forks source link

[BUGZILLA #17610] Consistency and customisation of printing and autoprinting #6784

Open MichaelChirico opened 4 years ago

MichaelChirico commented 4 years ago

Created attachment 2448 [details] Forwarding index tags

Please find attached four patches. The first three improve the consistency of printing and dispatch to print methods. They fix bugs and make printing and dispatch more predictable for end users.

The fourth patch proposes a new mechanism for customising printing at the console. It does so in a way that print() calls and autoprint are consistent. At the same time, packages are not affected by the customisation. Patch 4 requires patches 1 and 3.

Patch 1: Forwarding index tags ==============================

This patch materialises the index tag buffer tagbuf into an indexTag argument that is passed to the R level print(). The default print method then passes it back to the C routine. This fixes inconsistencies when printing recursive data structures:

x <- structure(list(list(1), 2), class = "foobar")

&num;## Before:
list(list(x))
#> [[1]]
#> [[1]][[1]]
#> [[1]]
#> [[1]][[1]]
#> [1] 1
#>
#>
#> [[2]]
#> [1] 2
#>
#> attr(,"class")
#> [1] "foobar"

&num;## After:
list(list(x))
#> [[1]]
#> [[1]][[1]]
#> [[1]][[1]][[1]]
#> [[1]][[1]][[1]][[1]]
#> [1] 1
#>
#>
#> [[1]][[1]][[2]]
#> [1] 2
#>
#> attr(,"class")
#> [1] "foobar"

x <- structure(NA, foo = structure(structure(list(1, 2), class = "foobar"), bar = 1))

&num;## Before
list(list(x))
#> [[1]]
#> [[1]][[1]]
#> [1] NA
#> attr(,"foo")
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> attr(,"class")
#> [1] "foobar"
#> attr(,"bar")
#> [1] 1

&num;## After
list(list(x))
#> [[1]]
#> [[1]][[1]]
#> [1] NA
#> attr(,"foo")
#> attr(,"foo")[[1]]
#> [1] 1
#>
#> attr(,"foo")[[2]]
#> [1] 2
#>
#> attr(,"foo")attr(,"class")
#> [1] "foobar"
#> attr(,"foo")attr(,"bar")
#> [1] 1

This change makes the output more consistent and predictable with S3 objects that don't implement print methods. It has positive effects in a few cases covered by make check. For instance the output of postscriptFonts():

&num;## Before
$serif
$family
[1] "Times"

$metrics
[1] "Times-Roman.afm"      "Times-Bold.afm"       "Times-Italic.afm"    
[4] "Times-BoldItalic.afm" "Symbol.afm"          

$encoding
[1] "default"

attr(,"class")
[1] "Type1Font"

&num;## After
$serif
$serif$family
[1] "Times"

$serif$metrics
[1] "Times-Roman.afm"      "Times-Bold.afm"       "Times-Italic.afm"    
[4] "Times-BoldItalic.afm" "Symbol.afm"          

$serif$encoding
[1] "default"

attr(,"class")
[1] "Type1Font"

Or the output of na.omit(NA):

&num;## Before
logical(0)
attr(,"na.action")
[1] 1
attr(,"class")
[1] "omit"

&num;## After
logical(0)
attr(,"na.action")
[1] 1
attr(,"na.action")attr(,"class")
[1] "omit"

If the S3 list does have a print method but calls NextMethod() or print.default() (in the latter case ... should be passed on), the index tags are now properly written as well:

print.foobar <- function(x, ...) {
    cat("<foobar>\n")
    attributes(x) <- NULL
    NextMethod()
}

x <- structure(list(1, b = 2), class = "foobar")

x
#> <foobar>
#> [[1]]
#> [1] 1
#>
#> $b
#> [1] 2

&num;## Before

list(foo = list(x))
#> $foo
#> $foo[[1]]
#> <foobar>
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2

&num;## After

list(foo = list(x))
#> $foo
#> $foo[[1]]
#> <foobar>
#> $foo[[1]][[1]]
#> [1] 1
#>
#> $foo[[1]][[2]]
#> [1] 2

In addition to fixing inconsistencies, the indexTag argument may also be useful to print a data structure with a custom index tag:

print(list(list(1, 2), 3), indexTag = "::foo")
#> ::foo[[1]]
#> ::foo[[1]][[1]]
#> [1] 1
#>
#> ::foo[[1]][[2]]
#> [1] 2
#>
#>
#> ::foo[[2]]
#> [1] 3

Or it could be used in boxed or proxy classes to print actual data with the proper index tags:

new_proxy <- function(...) {
    env <- new.env(parent = emptyenv())
    env$data <- list(...)
    structure(env, class = "proxy")
}
`$.proxy` <- function(x, i) {
    x[["data"]][[as.character(substitute(i))]]
}
print.proxy <- function(x, ...) {
    cat("<proxy>\n")
    print(x[["data"]], ...)
    invisible(x)
}

x <- new_proxy(x = 1:2, y = 3:4)

x$x
#> [1] 1 2

x
#> $x
#> [1] 1 2
#>
#> $y
#> [1] 3 4

&num;## Before:

list(foo = list(x))
#> $foo
#> $foo[[1]]
#> <proxy>
#> $x
#> [1] 1 2
#>
#> $y
#> [1] 3 4

&num;## After:

list(foo = list(x))
#> $foo
#> $foo[[1]]
#> <proxy>
#> $foo[[1]]$x
#> [1] 1 2
#>
#> $foo[[1]]$y
#> [1] 3 4

Unfortunately, passing the indexTag on recursion has implications for backward compatibility. Breakages should be limited because print methods are required to add ... in their signature for consistency. If they do, the indexTag argument passed on recursion should be swallowed in the dots. However, it can cause issues when ... is passed on to another function that doesn't take dots itself. A few places needed changes to avoid this issue:

In general, it seems that passing ... to other functions than print() should be considered an anti-pattern. Since print() is a recursive generic that dispatches to potentially heterogeneous methods (including the default method), no assumptions can safely be made about the contents of .... For instance, if x is a list that contains both data frames and factors, print(x, row.names = FALSE, max.levels = 3) will pass all arguments to both print methods, which should ignore the ones they don't know about. Given this, it would be good idea to push for stricter uses of ... in print methods.

Patch 2: Consistency of dispatch to base types ==============================================

This patch is a follow-up to r76565 (https://github.com/wch/r-source/commit/e06aa530). It enhances the consistency between print and autoprint by calling print() whenever a method is defined for an object, even when OBJECT is unset. With this second change, dispatch consistently reaches S3 methods for base types. Previously, it would only reach print.function() because of special-casing in the dispatch code:

print.list <- function(x, ...) {
    str(x, max.level = 2)
}

&num;## Before

list(list(1))
#> [[1]]
#> [[1]][[1]]
#> [1] 1

&num;## After

list(list(1))
#> List of 1
#>  $ :List of 1
#>   ..$ : num 1

Since we only call the R level print() when a method is defined, autoprint doesn't bump the namedness of objects unless it is necessary.

Patch 3: Consistency of dispatch context ========================================

This patch is another attempt at getting print() to forward the original calling environment when recursing back to R. This patch was already proposed in PR 17398, and rejected at the time. I will try to better explain the motivation behind it.

Currently, autoprint passes the current environment (an execution environment if called from the stack browser, or the global env if called from the console) to the native print routine. However, the R function print() passes its own execution environment rather than the environment of its caller. Since environment inherits from the base namespace, this prevents the recursive case of print() from dispatching to user-defined methods when the methods are also defined in base:

print.function <- function(x, ...) {
    cat("<FN-DISPATCHED>\n")
    NextMethod()
}

&num;## Correct before and after

identity
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>

print(identity)
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>

list(identity)
#> [[1]]
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>

&num;## Before

print(list(identity))
#> [[1]]
#> function (x)
#> x
#> <bytecode: 0x7ffb006cfc00>
#> <environment: namespace:base>

&num;## After

print(list(identity))
#> [[1]]
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7fdfa20e9e00>
#> <environment: namespace:base>

&num; Similarly for data frames:

print.data.frame <- function(x, ...) {
    cat("<DF-DISPATCHED>\n")
    base::print.data.frame(x, ...)
}

&num;## Before

print(list(data.frame(x = 1)))
#> [[1]]
#>   x
#> 1 1

&num;## After

print(list(data.frame(x = 1)))
#> [[1]]
#> <DF-DISPATCHED>
#>   x
#> 1 1

Note that if a print method is defined, and this method calls print.default() again (possibly via NextMethod()), this forces a recursion through print(). In this case, auto-printing lists used to disrupt dispatch, which is also fixed by patch 3:

print.list <- function(x, ...) {
    cat(sprintf("<list: %d elements>\n", length(x)))
    NextMethod()
}

&num;## Before:

list(identity, 2)
#> <list: 2 elements>
#> [[1]]
#> function (x)
#> x
#> <bytecode: 0x7f9e310c9a00>
#> <environment: namespace:base>
#>
#> [[2]]
#> [1] 2

&num;## After:

list(identity, 2)
#> <list: 2 elements>
#> [[1]]
#> <FN-DISPATCHED>
#> function (x)
#> x
#> <bytecode: 0x7f9e310c9a00>
#> <environment: namespace:base>
#>
#> [[2]]
#> [1] 2

In essence, this patch ensures lexical scoping of method dispatch by properly forwarding the dispatch context throughout recursion.

Patch 4: User customisation for printing ========================================

Defining print() methods in the global environment is generally not appropriate for foreign and base types because global methods have global effect, including in package functions calling print() inside larger print methods. It is common for print methods in packages to arrange data as a base type like a matrix and call print() again on the result (one example of this is in print.data.frame). Overriding methods globally causes such unpredictability in print outputs that it can't be considered as a proper way of customising printing for day-to-day analysis and developing.

Yet customising how objects are printed at the console would be very useful for R users. They could elect to print some objects to display more information, for instance numeric vectors could be displayed with statistical summaries, or environments could be printed with their bindings and their parent. Printing objects in the console is the main way to interact with R and making it easy to customise printing would be handy for data analysis and for debugging.

Patch 4 proposes and implements a global option autoprint for customising printing. This option should be set to a function taking x and ..., like a print method. Because we allow arbitrary autoprint functions, this global option is very flexible, an can use either structured or unstructured dispatch. For instance it could be set to a generic for which methods could be implemented, or to a function that does ad hoc dispatch. It is also very easy to change the custom autoprint function depending on the task at hand. The user can maintain their own autoprint function for debugging, and then switch to another autoprint function exported by a package for data analysis.

Here is an example of custom autoprint function implementing ad hoc dispatch. It is not meant to be state of the art or even usable, but just illustrate the sort of things that can be accomplished:

myprint <- function(x, ...) {
    if (!is.object(x) && is.list(x)) {
    str(x, max.levels = 3)
    return(invisible(x))
    }

    if (is.factor(x)) {
    cat("<factor>\n")
    print(table(x, dnn = NULL), ...)
    return(invisible(x))
    }

    switch(typeof(x),
    language = {
        print(lobstr::ast(!!x), ...)
        return(invisible(x))
    },
    environment = {
        rlang::env_print(x)
        return(invisible(x))
    }
    )

    if (is.data.frame(x))
    x <- data.table::as.data.table(x)

    &num; Fall back to print
    print(x, ...)
}

options(autoprint = myprint)

&num; Lists are now printed with str()
list(list(1, list(3)))
#> List of 1
#>  $ :List of 2
#>   ..$ : num 1
#>   ..$ :List of 1
#>   .. ..$ : num 3

&num; Factors are printed as counts
iris$Species
#> <factor>
#>     setosa versicolor  virginica
#>         50         50         50

&num; Language objects are displayed with their tree structure
foo&sim; bar() + 1
#> █─&sim;`
#> ├─foo
#> └─█─`+`
#>   ├─█─bar
#>   └─1

&num; Environments are displayed with their contents
asNamespace("stats")
#> <environment: namespace:stats> [L]
#> Parent: <environment: imports:stats>
#> Bindings:
#>  * princomp.default: <lazy> [L]
#>  * formula.glm: <lazy> [L]
#>  * drop1.glm: <lazy> [L]
#>  * print.tskernel: <lazy> [L]
#>  * labels.lm: <lazy> [L]
#>  * binomial: <lazy> [L]
#>  * contr.treatment: <lazy> [L]
#>  * extractAIC: <fn> [L]
#>  * anova.glm: <lazy> [L]
#>  ... with 1122 more bindings

&num; Data frames are printed with the data.table method
iris
#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#>   1:          5.1         3.5          1.4         0.2    setosa
#>   2:          4.9         3.0          1.4         0.2    setosa
#>   3:          4.7         3.2          1.3         0.2    setosa
#>   4:          4.6         3.1          1.5         0.2    setosa
#>   5:          5.0         3.6          1.4         0.2    setosa
#>  ---
#> 146:          6.7         3.0          5.2         2.3 virginica
#> 147:          6.3         2.5          5.0         1.9 virginica
#> 148:          6.5         3.0          5.2         2.0 virginica
#> 149:          6.2         3.4          5.4         2.3 virginica
#> 150:          5.9         3.0          5.1         1.8 virginica

&num; Other objects are printed normally via the print() fallback
letters
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
#> [19] "s" "t" "u" "v" "w" "x" "y" "z"

Here are the main constraints at play for this customisation mechanism:

  1. Autoprint and print() calls from user scripts should be consistent (as documented since S).

  2. print() calls from packages should not use the user customisation. They should be shielded so their output is predictable.

  3. To keep the interface simple, the customisation mechanism should be able to fall back to regular printing by simply calling print(). The customisation function should be turned off for this invokation of print (to avoid infinite recursion), but turned on again in the recursive case, when printing elements of lists.

The combination of constraints 1 and 2 is the main complicating factor, but they are important to make printing consistent for the user. Printing an object at the console or via print-debugging, including inside functions, should produce the same output. Solving all 3 constraints simulteously was challenging but I think the proposed patch provides a relatively simple solution that should work robustly and predictably.

Setting the autoprint function does mean that we invoke an R function for all objects printed, as well as for all elements of lists in the fallback case. Consequently autoprinting always bumps the namedness of objects when the autoprint option is set, which will cause unnecessary duplications. Also, printing lists would behave very erratically without patch 1 because the index tags wouldn't be forwarded back to the native routine.


METADATA

MichaelChirico commented 4 years ago

Created attachment 2449 [details] Consistency of dispatch to base types


METADATA

INCLUDED PATCH

MichaelChirico commented 4 years ago

Created attachment 2450 [details] Consistency of dispatching context


METADATA

INCLUDED PATCH

MichaelChirico commented 4 years ago

Created attachment 2451 [details] User customisation for printing


METADATA

INCLUDED PATCH