Open roaldarbol opened 7 months ago
Thanks for the feedback. The reason the checks you mention aren't implemented is that it is less straightforward to check that the observations are in the right domain for distributions over discrete sets (e.g., Poisson, binomial). That being said, this should still be feasible (possibly ignoring the case of discrete observations at first). I think the steps could be:
Add support_
as a field to the Dist
class, which takes a string of the form "(-Inf, Inf)"
, or "(0, Inf)"
, or "[0, Inf)"
(distinction between open and closed intervals is important)
support_
Dist$field()
accessorsupport
argument to Dist$initialize()
and initialise in constructor (possibly with a default value)Add the support to existing distributions in R/dist_def.R
Add a method Dist$check_support(x)
, which unpacks the character string from step 1, extracts the relevant inequalities, and checks whether those are satisfied by the input x
(vector of observations)
Add check in Observation$initialize()
Dist$check_support(x)
,Note that this won't be a priority because the error message is already reasonably informative ("Check that the data are within the domain of definition of the observation distributions").
Wow, that's a super thorough fix - I like it! And no worries, there's no rush, just thought it was worth to file the issue. :-)
I must admit I couldn't find a source that specified that gamma distributions have to be >0, but it might have been hidden in formulaic language somewhere. So even just having a tiny write-up about what the "domain of definition of the observation distribution" is for the different distributions would be of great help.
First of all, sorry about all the issues raised - just hoping to help as much as possible, as I appreciate all the work put into building this brilliant package, so it's all in a spirit of appreciation! :-)
When trying to fit data that includes zeros to a
gamma
model, the error message is somewhat uninformative. Maybe it could be possible to add a test of assumptions of the data given the model to give an error message that reflects which of the underlying assumptions are violated. So e.g. when running agamma
distribution, the check could be a simpleany(data[["y"]] == 0)
. But could maybe be implemented as a general feature to check assumptions and report errors first.Here's a reprex which produces the error only when 0 is in the data set.
Created on 2024-04-28 with reprex v2.1.0