Closed dvg-p4 closed 1 year ago
AFAIK, that's an output from base R:
> min(c(NA, NA, NA), na.rm=TRUE)
[1] Inf
Warning message:
In min(c(NA, NA, NA), na.rm = TRUE) :
no non-missing arguments to min; returning Inf
I'm not sure how data.table
can handle it. I reckon that if there are no rows, then no operation should be processed, but then a deliberate action that creates specific columns (say for an rbindlist
operation later) will not be created; which makes this seem inescapable. For the short term, you should be able to suppress those warnings with
> suppressWarnings(min(c(NA, NA, NA), na.rm=TRUE))
[1] Inf
or other methods described here.
Hi @dvg-p4,
I understand that the warning message may seem spurious, but I believe it is actually expected behavior from min()
when there are no non-missing arguments.
As documented in ?min
, min()
returns Inf when applied to an empty set of numeric values to ensure transitivity, such as in the case of min(x1, min(x2)) == min(x1, x2)
.
In short, it is not an issue with {data.table}
.
Here are some examples for clarification:
min()
#> [1] Inf
#> Warning message:
#> In min() : no non-missing arguments to min; returning Inf
min(numeric(0))
#> [1] Inf
#> Warning message:
#> In min(numeric(0)) : no non-missing arguments to min; returning Inf
min(NA, na.rm = TRUE)
#> [1] Inf
#> Warning message:
#> In min(NA, na.rm = TRUE) : no non-missing arguments to min; returning Inf
If you really need to suppress the warnings during the "giant loop" you can use this suggestion:
options(warn = -1) # ignore warnings
min() # your code here
options(warn = 0) # reset 'warn'
I hope this explanation clarifies the behavior you are seeing.
Thanks for the explanations! Looking into the source code I think I'm understanding a bit better what's going on--data.table intentionally runs the function once even on an empty table/subset, in order to create the correct output column structure: https://github.com/Rdatatable/data.table/blob/bbe41642a23d34b1cc491e3ff64d124c0b3ea3bd/src/dogroups.c#L173 So, if I'm getting the gist of this, always running the function at least once (which will necessarily produce a warning message like this for aggregation functions that warn on an empty input) is a feature, not a bug, which allows correctly-typed empty columns to be returned by something like this:
> mydt[, .(avg = mean(foo), min = min(foo), n = length(foo), char = paste(foo, collapse = ",")), by = bar] |> str()
Classes ‘data.table’ and 'data.frame': 0 obs. of 5 variables:
$ bar : num
$ avg : num
$ min : num
$ n : int
$ char: chr
- attr(*, ".internal.selfref")=<externalptr>
Warning message:
In min(foo) : no non-missing arguments to min; returning Inf
This seems reasonable and better than any alternatives I can think of, so I'll close this ticket.
Note to future searchers:
This behavior is the result of data.table running the aggregation function at least once even on an empty table, which is intentional and good--see discussion below. The warning message is from base R, when
min
is run on an empty list (as it must be to consistently generate correctly-typed empty columns). It can be suppressed withsuppressWarnings([dt call])
oroptions(warn = -1)
if needed.Description
If a data.table query uses
by
, and an aggregate expression that usesmin
ormax
, and returns zero rows, there will be a warning message printed along the lines of "no non-missing arguments to min; returning Inf". This is spurious, IMO, sinceInf
is not actually being returned--the query is returning an empty table, as expected.The actual return values are consistent with behavior for non-empty results, but the warning is annoying--I have code that runs in a giant loop (calculating values for ~thousands of columns), and about a hundred of those have an empty aggregate table in an intermediate part of the calculation, so I get spammed with ~ a hundred warnings when my code runs successfully.
Minimal reproducible example
More realistic case
As expected, only entries for values of foo that have at least one case of bar > 0
Consistent with the above behavior, an empty table (no rows match the criterion, so there are no values of foo to aggregate over). However, there is a warning message displayed.
Simpler, less-realistic example
Warning message is definitely appropriate here--
Inf
was actually returned.However it is spurious here--the degenerate case of an empty table is returned, which does not include any
Inf
values.Output of sessionInfo()
On my laptop
On our linux box