Open jameslefevre opened 4 years ago
the problem runs deeper. apparently Statistics.jl assumes vector spaces, but most things work in affine spaces as well. in particular mean is the same concept as barycenter in affine spaces. DateTime forms an affine space, and thus mean should be defined for it, but it isn't as it is implemented now.
adding a specialization for DateTime to mean (and middle) seems like a patchwork. the true solution would be an implementation of mean which is affine space aware. consider:
mean t = t_base + sum (t_i - t_base) / n
it raises a number of issues, namely the choice of t_base, which is arbitrary in theory, but might introduce practical differences. we also don't want to work with numbers in this way for performance reasons, so some kind of Holy traits might come to the rescue.
This issue is about median
so that's totally independent from mean
.
@quinnj Do you think the definition proposed in the OP is correct? Or would it make sense to define middle
in terms of x + (y-x)/2
?
OP definition seems.....fine. It's just treating Date
/DateTime
as raw integer values, then constructing directly from results. I think you can just do round(Int, ...)
though instead of Int(round(...))
.
But yeah, seems fine to me.
Cool. Feel free to make a PR @jameslefevre.
To clarify what this addition would entail: it would mean that middle
on Date
s chooses the "middle" date allowing for some rounding. This would differ from date arithmetic, where e.g. x + (y-x)/2
fails if the result isn't an integer instead of returning a rounded result.
This may be OK though and would kind of justify the existence of middle
. We would also need a more general version of middle
which would be used by quantile
, taking an argument γ
and returning a rounded x + γ*(y-x)
as in
https://github.com/JuliaLang/Statistics.jl/blob/74897fed33700dba92578aa0fefef5b99ba16086/src/Statistics.jl#L1006
Or would it make sense to define middle in terms of
x + (y-x)/2
?
That makes most sense to me. AFAICT it's equivalent to the current definition (x + y) / 2
except it generalizes to affine spaces like dates and Fahrenheits.
I'm also not really convinced rounding is the right thing. How do we know whether to round up or down? InexactError
would be safer so the user can be intentional about the decision when it's unclear what they meant, and that's what will happen automatically using the x + (y-x)/2
implementation. I'd propose middle(today, tomorrow)
error and middle(yesterday, tomorrow) == today
.
This will also apply for middle('a', 'b')
, which is ambiguous.
x + (y-x)/2
probably makes sense in general, but for Date
and DateTime
without rounding an error will be thrown most of the time so I'm not sure it would be useful for that use case. Maybe it would be useful for other cases though.
I have this working code:
But I wanted to quickly check that this is an appropriate approach before putting in a pull request. I saw open pull request #28 (relax type definition of middle), but I don't think helps - Date and DateTime should not subtype either Real or Number.