Handle missing, NaN, Inf values in pd() and other summary functions

bwiernik commented 1 month ago

Currently, the posterior summary functions like pd() ignore missing and infinite values. These are retained (e.g., as missing) when compared against the null value, producing erroneous results like pd = 1 for a vector of all missing values.

bayestestR::pd(NA_real_)

x <- c(1, 2, NA, Inf, NaN, -Inf)
bayestestR::pd(x)

We should add an remove_na argument to these functions, defaulting to TRUE. If remove_na = FALSE and there are missing values, the functions should return NA_real_. When remove_na = TRUE, they should be removed before computing the result.

For Inf and -Inf values, arguably these are larger/smaller than 0, but they would usually mean some sort of convergence problem, so I'm thinking it might be better to remove them or to have an argument remove_inf defaulting to TRUE to control removing them.

mattansb commented 1 month ago

I think Inf/-Inf should remain - these can represent overflows (e.g., exp(big number)).

Implementation could be as easy as adding this to the underlying .p_direction() function, since all methods pass ... down to it.

Should the argument tbe named na_rm = TRUE for general consistency with R/tidy principals?

strengejacke commented 1 month ago

I think we have remove_na throughout our packages.

strengejacke commented 1 month ago

Maybe we can just give a message/warning for Inf values?

easystats / bayestestR

Handle missing, NaN, Inf values in pd() and other summary functions #664