Open MichaelChirico opened 5 months ago
Should we remove Print()
from the already existing tests or this advice is just for the future use?
Honestly not sure. I think leaving it in existing tests is fine. This is about if we can avoid running print()
"under the hood" in test()
when output=
is provided and x=
is a call to print()
. Not sure it will actually improve the implementation at all, just a thought that it's odd to actually wind up running print()
twice.
IMO, it would be healthy to only print once -- I vaguely recall having trouble finding small differences when testing with print and output with large tables, as the test failure would "expect" a double print, which certainly looks worse than it really is... A small example I was able to make here:
test(2279.1, print(data.table(a = 1)), output=" a\n1: 2")
Test 2279.1 did not produce correct output:
Expected: << a\n1: 2>>
Observed: << a\n1: 1\n a\n1: 1>>
It makes things a little harder to read as content gets bigger, requiring the reader to scroll horizontally a bit more.
Anyway, I believe separating not calling print twice when x
already contains a call to print should be pretty easy, I'd be happy to write up a quick PR. (Also there may be a slight speed improvement for R CMD Check
by calling print
less? Although this is a constant and probably wouldn't matter in the big picture, might do some benchmarking)
Great motivating example. For now, I have #6266 in mind -- even if we fix the existing test scripts, it will be easy for them to backslide unless we have simple static checks in place. I also have #5830 and #5845 in mind -- while making this change does have the nice benefits you cite, it's not particularly urgent & will generate some git conflict churn in the meantime. So I would still prioritize burning down the PR queue first.
while making this change does have the nice benefits you cite, it's not particularly urgent & will generate some git conflict churn in the meantime. So I would still prioritize burning down the PR queue first.
Right, this makes sense to me. If someone (probably future me) ever decides to pursue this, here's what I did to evaluate x
without printing if it already contains a call to print:
# test.data.table.R
if (any(grepl("^print\\b", deparse(xsub)))) {
out = capture.output(x <- suppressMessages(withCallingHandlers(tryCatch(x, error=eHandler), warning=wHandler, message=mHandler)))
} else {
out = capture.output(print(x <- suppressMessages(withCallingHandlers(tryCatch(x, error=eHandler), warning=wHandler, message=mHandler))))
}
Whenever
output=
is used intest()
, we forceprint(x)
:https://github.com/Rdatatable/data.table/blob/f5a1e09328fef5f8a4c5e225bc5b75ca09700f76/R/test.data.table.R#L425
We can consider not doing so if
x
already contains a call toprint()
, e.g. these 27 cases:Note that cases like
print(DT), output=...
could be linted into justDT, output=...
, but cases likeprint(DT, ...)
with non-default options need explicitprint()
invocation.OTOH, maybe it's overkill / won't make the code any easier to understand to pursue this.