JuliaStats / StatsFuns.jl

Mathematical functions related to statistics.
Other
235 stars 40 forks source link

Move implementations in src/distrs/norm.jl to Distributions #42

Open dmbates opened 6 years ago

dmbates commented 6 years ago

As far as I can see the collection of "d-p-q-r" functions (to use the R terminology) in src/distrs/norm.jl is complete. I propose that the implementations (not the functions themselves) be moved to Distributions/src/univariate/continuous/normal.jl and the functions here be deprecated.

The deprecation might be a bit tricky because the Distributions package requires the StatsFuns package, not the other way around. The situation here would be that to deprecate functions in StatsFuns you need to ensure that a certain version of Distributions or later is available. Given that user code and other packages "should" not call these functions directly that may not be too much of a problem. But it is a tricky situation.

Where should a function like _norminvlogcdf_impl and its dependencies go? I think it should migrate with the other code to Distributions but then does it need a new name to avoid a conflict with the function here during the deprecation period?

I hope what I am asking makes sense. Ping me if this is too confusing. I have only had one cup of coffee so this might be complete nonsense.

andreasnoack commented 6 years ago

I'll have to think a little more but I'm wondering if it would be easier just to move all of StatsFuns to Distributions in one go (except for the few pieces that need to go into StatsBase). Then we could just ask all users to use Distributions right away instead of having a long period where only parts of the functionality have been migrated.

dmbates commented 6 years ago

@andreasnoack I don't think you want to move everything in StatsFuns to Distributions until the dependencies on Rmath are removed. I thought the reason for having calls from Distributions to functions in libRmath go through intermediate functions like normlogpdf in StatsFuns was to be able to transition to native Julia implementations in stages. As it stands you can't remove a call to the @_delegate_statsfuns macro unless you have a complete set of what are called in R d-p-q-r functions for the distribution.

At present there is an indirect dependence of Distributions on Rmath but the hope is to remove that. I don't think you want to reintroduce a direct dependence of Distributions on Rmath.

In my vision StatsFuns eventually shrinks down to constants and convenience functions like xlogx and log1pexp. I think there would still be a case for having such functions in a separate package. Distributions is a heavyweight dependency to just get an implementation of xlogx.

andreasnoack commented 6 years ago

I think it would be easier to transition to native implementations if all of the functionality in StatsFuns was inside Distributions. It could be a submodule in Distributions if necessary but I'm not sure it is. I also don't think there is a substantial difference between having Rmath as an indirect dependency compared to having it as a direct dependency. Why would the direct dependency be a problem? My comment "...except the few pieces that need to go into StatsBase" was exactly about xlogx and log1pexp.

dmbates commented 6 years ago

I checked and there are some packages that require StatsFuns but not either Distributions or "StatsBase`.

julia> setdiff(Pkg.dependents("StatsFuns"), union(Pkg.dependents("Distributions"), Pkg.dependents("StatsBase")))
6-element Array{AbstractString,1}:
 "RiemannTheta"                       
 "DynamicHMC"                         
 "Erdos"                              
 "Divergences"                        
 "HilbertSchmidtIndependenceCriterion"
 "LogProbs"                    
andreasnoack commented 6 years ago

Yes. We'd have to open some PRs against those packages to fix them but I don't think it should too much work.

dmbates commented 6 years ago

I agree that it is a good idea to move the functions related to distributions back into the Distributions package. I don't really recall why they were moved into StatsFuns in the first place. They seem to have been part of StatsFuns from the earliest commits by Dahua Lin (July 25, 2015).

My recollection is that the functions linking to libRmath were initially a separate package then became part of the Distributions package. Do you remember why Dahua decided to move them here? I am asking because I don't want to transfer these functions back to Distributions only to have someone come forward and say "we moved them to StatsFuns for a good reason and you should move them back".

andreasnoack commented 6 years ago

Dahua factored these out without much discussion and announced it in https://groups.google.com/forum/#!topic/julia-stats/nOObNwKM1b8. As you can see from the thread, I also thought it was a good idea but less than a year after, I'd changed my mind, https://groups.google.com/forum/#!searchin/julia-stats/statsfuns%7Csort:date/julia-stats/SRBpQnhDw54/FzE1O9PBJgAJ. It might be that a few people like the C-ish API here in StatsFuns but I think it is better to focus only on the Distributions API.