Open HenrikBengtsson opened 1 year ago
It's actually the readr package that adds the problems
attribute. From help("problems", package = "readr")
:
"Readr functions will only throw an error if parsing fails in an unrecoverable way. However, there are lots of potential problems that you might want to know about - these are stored in the problems
attribute of the output ..."
marshal()
on a tbl
object could simply drop the problems
attribute.
marshal()
on atbl
object could simply drop theproblems
attribute.
Ah, the problems
attribute may also contain non-pointer objects, so we don't always have to drop it. For example,
> x <- parse_integer(c("1X", "blah", "3"))
Warning: 2 parsing failures.
row col expected actual
1 -- no trailing characters 1X
2 -- no trailing characters blah
> str(x)
int [1:3] NA NA 3
- attr(*, "problems")= tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
..$ row : int [1:2] 1 2
..$ col : int [1:2] NA NA
..$ expected: chr [1:2] "no trailing characters" "no trailing characters"
..$ actual : chr [1:2] "1X" "blah"
More clues about alternatives can be found in:
readr:::problems
function (x = .Last.value)
{
problems <- probs(x)
if (is.null(problems)) {
return(invisible(no_problems))
}
if (inherits(problems, "tbl_df")) {
return(problems)
}
vroom::problems(x)
}
So, it looks like vroom might be involved too;
> vroom::problems
function (x = .Last.value, lazy = FALSE)
{
if (!inherits(x, "tbl_df")) {
cli::cli_abort(c("The {.arg x} argument of {.fun vroom::problems} must be a data frame created by vroom:",
x = "{.arg x} has class {.cls {class(x)}}"))
}
if (!isTRUE(lazy)) {
vroom_materialize(x, replace = FALSE)
}
probs <- attr(x, "problems")
if (typeof(probs) != "externalptr") {
cli::cli_abort(c("The {.arg x} argument of {.fun vroom::problems} must be a data frame created by vroom:",
x = "{.arg x} seems to have been created with something else, maybe readr?"))
}
probs <- vroom_errors_(probs)
probs <- probs[!duplicated(probs), ]
probs <- probs[order(probs$file, probs$row, probs$col), ]
tibble::as_tibble(probs)
}
<environment: namespace:vroom>
From the above, marshalling of tbl_df
(sic!) could rely on the following "pruning" method:
prune.tbl_df <- function(x, ...) {
problems <- attr(x, "problems", exact = TRUE)
## Materialize `problems` stored elsewhere in this process?
if (typeof(problems) == "externalptr") {
problems <- vroom::problems(x)
attr(x, "problems") <- problems
}
x
}
Comment: We could use NextMethod("prune")
at the end.
Comment 2: We've punted on the idea of having prune()
methods thus far, but maybe this is an argument for having them. Maybe it should be names something else than "prune", because pruning could also mean "drop unnecessary content".
A
tbl
may contain an external pointer via attributeproblems
, e.g.