Open NAThompson opened 2 years ago
I am wondering the same.
Replacing some Real
s by Number
s, and 0
s by zero(x)
s, already makes a surprising number of things work:
> using Distributions
> using Unitful: mV, Hz, s
> n = Normal(0mV, 1mV)
Normal{Unitful.Quantity{…}}(μ=0 mV, σ=1 mV)
> pdf(n, 3mV)
0.0044318484119380075 mV^-1
> λ = 4Hz;
> e = Exponential(1/λ)
Exponential{Unitful.Quantity{Float64, 𝐓, Unitful.FreeUnits{(Hz^-1,), 𝐓, nothing}}} (θ=0.25 Hz^-1)
> rand(e) |> s
0.6119753615525725 s
How fun!
Of course more patches are needed.
And some support from Unitful's side (randn(::AbstractRNG, ::Type{Quantity{T,D,U})
eg).
And some distributions will need an extra "scale" or "units" parameter: μ and σ of LogNormal's default parametrisation are unitless, and do not give enough information to implement cdf
and rand
etc with units.
Just replacing Real
with Number
is not sufficient for Normal
as the example above shows - pdf
, logpdf
, cdf
, logcdf
, etc. should all return unitless numbers. Thus unfortunately I assume one would have to modify each distribution very carefully and hence probably it's easier to change each one separately.
pdf
does have units, namely the reciprocal of the independent variable's.
(As the pdf is a density, we have eg 0.24 probability mass per 1 mV, at some voltage).
You can see this by doing dimensional analysis on pdf expressions.
Eg for Exponential: pdf(x) = 1/θ exp(–x/θ)
. The scale parameter θ
has the same units as x
, so the pdf has one over these units.
Or for Normal: pdf(x) =
where both μ
and σ
are in data units.
cdf
is unitless (it's a probability, and also the integral of pdf(x) * dx
, where dx
has data units; another way to see what the units of pdf
should be).
logcdf
is then of course also unitless.
logpdf
is unitless as well (as is every logarithm), but a conversion factor is needed (to get eg log(0.24/mV * mV)
).
Samples (rand
) and quantiles of course have units, as do various summary statistics.
Indeed many patches besides Real
-> Number
are needed. (Although this simple change already goes a long way, thanks to Julia's generic algorithm culture. But Distributions still has a bunch of non generic code, like 0
instead of zero
, or eltype(::Sampleable)
being hardcoded as Int
/Float64
).
And of course an extensive test suite is a must.
I also agree that each distribution needs to be manually examined.
pdf does have units, namely the reciprocal of the independent variable's.
Not in general but only for distributions without atoms - ie. eg. not for DiscreteDistribution
, censored
distributions (open PR), or the Normal(0, 0)
limit case (which is supported).
Thus it seems supporting the limit case Normal(0, 0)
or censored
distributions created from ContinuousDistribution
s would lead to type instabilities or inconsistencies.
(Clearly, logpdf
is always unitless but one has to deal with a possibly arbitrary unit constant.)
It's unfortunate we use pdf
to refer to the PDF and the PMF. For a discrete distribution, it's returning probabilities so it should be unit less, but I think for mixtures of continuous and discrete distributions, where it's a density function, the PMF of the atomic component should be implicitly multiplied by a Dirac delta function so that expectations could be taken via integrals. And by it's definition of integrating to 1, Dirac delta has units of the inverse of the variable.
Another - but probably non-standard - approach would be to choose different base measures (they're all implicit anyway currently) that ensure that pdf
is always unitless. That shifts the unit constant choice from logpdf
to pdf
and would work with atoms and Diracs without unit. This might seem a bit strange but in the case of Normal
it would be similar to a Lebesgue measure that is scaled by 1/sqrt(2pi*sigma^2)
as base measure.
It's unfortunate we use pdf to refer to the PDF and the PMF. For a discrete distribution, it's returning probabilities so it should be unit less
It's a density in the discrete case as well, just with respect to the counting measure. So the name pdf
makes sense. Nevertheless it would be helpful for eg. Truncated
and Censored
if one could query the (possibly zero) probability of a specific value.
+1 that unitful support would be cool ^_^
I gave this a go a while back: https://github.com/JuliaStats/Distributions.jl/compare/master...tfiers:Distributions.jl:master
I implemented unit support for Exponential
and LogNormal
, and wrote tests and documentation.
Changes done to make this work:
Replace many ::Real
types by ::RealQuantity
(besides in lognormal.jl
and exponential.jl
, also in univariates.jl
and genericrand.jl
).
That type is a union of Real
and Unitful.Quantity{<:Real}
::Real
typings could be dropped altogether.In exponential.jl
and lognormal.jl
, go through each function (zval, entropy, logpdf, ..) and add / unit(d)
, * unit(d)
, * unit(d)^2
..., in appropriate places. Also replace some one(x)
s by oneunit(x)
, and some 0
s by zero(x)
.
I added units.jl
, which does nothing more than define the RealQuantity
alias,
and the unit(d)
mentioned above: Unitful.unit(::D) where {D <: Distribution} = unit(eltype(D))
.
(Note that oneunit(x)
is provided by Base Julia).
unit(d) = unit(eltype(D))
definition works nicely for Exponential
, but not for LogNormal
: in the (μ,σ) parametrization, both parameters are in the unitless logarithmic domain. So I added a type parameter to store the unit: struct LogNormal{T<:Real,U<:Units}
. It can be supplied in a constructor, and is NoUnits
by default.I made sure that there was no change in current usage of Distributions.jl. Most unit operations are done in the type-domain, so the runtime impact should be minimal or zero (though I have not benchmarked that).
I started a docs page with a list of "Supported distributions" (just the two), and a guide for how to add unit support to a new distribution.
supporting the limit case Normal(0, 0)
pdf(Normal(0mV, σ))
for σ → 0mV
should be ∞ / mV no? I.e. same units (mV⁻¹) as for other values, so type stable.
using Unitful
currently takes 1 to 3 seconds (1.0 seconds on replit, 1.6 to 2.6 seconds on my laptop).
Therefore, to minimize the impact of unit support in Distributions.jl on the 99% of users that don't need it, there first needs to be created a lightweight package defining just the interface/API for working with units (UnitfulCore
or UnitsInterface
or sth like that). StaticArrays
with StaticArraysCore
is an example of this principle.
Instead of importing the full Unitful package, Distributions will then instead import just the interface package, with minimal impact on load time.
An update, from the discussion at
with Julia 1.9's 'extension packages' (https://pkgdocs.julialang.org/dev/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)), a unitful core or interface package is no longer needed, and the full Unitful
would be added as a [weakdeps]
, and there would presumably be a UnitfulExt
module and entry under [extensions]
in Distributions.jl's Project.toml
.
Though, given that is 1.9+ only, I'm not sure how this interacts with Distributions.jl's current support for Julias down to 1.0
UPDATE: See this helpful summary by mbauman and chrisr that anwers these questions: https://discourse.julialang.org/t/package-extensions-for-julia-1-9/93397/5?u=tfiers
Using
Unitful v1.9.1
andDistributions v0.25.23
, it appearsUnitful
numbers are not supported:Could this be supported without undue pain?