Open dmbates opened 6 years ago
I'll have to think a little more but I'm wondering if it would be easier just to move all of StatsFuns
to Distributions
in one go (except for the few pieces that need to go into StatsBase
). Then we could just ask all users to use Distributions
right away instead of having a long period where only parts of the functionality have been migrated.
@andreasnoack I don't think you want to move everything in StatsFuns
to Distributions
until the dependencies on Rmath
are removed. I thought the reason for having calls from Distributions
to functions in libRmath
go through intermediate functions like normlogpdf
in StatsFuns
was to be able to transition to native Julia implementations in stages. As it stands you can't remove a call to the @_delegate_statsfuns
macro unless you have a complete set of what are called in R d-p-q-r
functions for the distribution.
At present there is an indirect dependence of Distributions
on Rmath
but the hope is to remove that. I don't think you want to reintroduce a direct dependence of Distributions
on Rmath
.
In my vision StatsFuns
eventually shrinks down to constants and convenience functions like xlogx
and log1pexp
. I think there would still be a case for having such functions in a separate package. Distributions
is a heavyweight dependency to just get an implementation of xlogx
.
I think it would be easier to transition to native implementations if all of the functionality in StatsFuns
was inside Distributions. It could be a submodule in Distributions
if necessary but I'm not sure it is. I also don't think there is a substantial difference between having Rmath as an indirect dependency compared to having it as a direct dependency. Why would the direct dependency be a problem? My comment "...except the few pieces that need to go into StatsBase
" was exactly about xlogx
and log1pexp
.
I checked and there are some packages that require StatsFuns
but not either Distributions
or "StatsBase`.
julia> setdiff(Pkg.dependents("StatsFuns"), union(Pkg.dependents("Distributions"), Pkg.dependents("StatsBase")))
6-element Array{AbstractString,1}:
"RiemannTheta"
"DynamicHMC"
"Erdos"
"Divergences"
"HilbertSchmidtIndependenceCriterion"
"LogProbs"
Yes. We'd have to open some PRs against those packages to fix them but I don't think it should too much work.
I agree that it is a good idea to move the functions related to distributions back into the Distributions
package. I don't really recall why they were moved into StatsFuns
in the first place. They seem to have been part of StatsFuns
from the earliest commits by Dahua Lin (July 25, 2015).
My recollection is that the functions linking to libRmath were initially a separate package then became part of the Distributions package. Do you remember why Dahua decided to move them here? I am asking because I don't want to transfer these functions back to Distributions
only to have someone come forward and say "we moved them to StatsFuns
for a good reason and you should move them back".
Dahua factored these out without much discussion and announced it in https://groups.google.com/forum/#!topic/julia-stats/nOObNwKM1b8. As you can see from the thread, I also thought it was a good idea but less than a year after, I'd changed my mind, https://groups.google.com/forum/#!searchin/julia-stats/statsfuns%7Csort:date/julia-stats/SRBpQnhDw54/FzE1O9PBJgAJ. It might be that a few people like the C-ish API here in StatsFuns but I think it is better to focus only on the Distributions API.
As far as I can see the collection of "d-p-q-r" functions (to use the R terminology) in
src/distrs/norm.jl
is complete. I propose that the implementations (not the functions themselves) be moved toDistributions/src/univariate/continuous/normal.jl
and the functions here be deprecated.The deprecation might be a bit tricky because the
Distributions
package requires theStatsFuns
package, not the other way around. The situation here would be that to deprecate functions inStatsFuns
you need to ensure that a certain version ofDistributions
or later is available. Given that user code and other packages "should" not call these functions directly that may not be too much of a problem. But it is a tricky situation.Where should a function like
_norminvlogcdf_impl
and its dependencies go? I think it should migrate with the other code toDistributions
but then does it need a new name to avoid a conflict with the function here during the deprecation period?I hope what I am asking makes sense. Ping me if this is too confusing. I have only had one cup of coffee so this might be complete nonsense.