JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 414 forks source link

What is the breaking change in 0.25? #1317

Open davidanthoff opened 3 years ago

davidanthoff commented 3 years ago

I'm looking at the release notes, and I can't figure out what was breaking in this release?

devmotion commented 3 years ago

The types of MixtureModel and LocationScale were changed.

davidanthoff commented 3 years ago

Could there maybe some language be added to the release notes that explains what one has to do to make downstream packages compatible? There are literally probably hundreds of packages that now get a CompatHelper notification that they should mark their package as compatible with v0.25, and at the moment it is really difficult to figure out what I have to do as a downstream package maintainer.

Also, just out of curiosity: is there a chance that we could bunch breaking changes together, and ideally just have one 1.0 release at some point? Distributions.jl is soo deep in the dependency tree and affects so many packages, that the cost in terms of contributor time of downstream packages is enormous when there is a breaking version released here...

devmotion commented 3 years ago

I agree, probably it would be good to not only rely on the information about merged PRs and closed issues but add some more detailed release notes. And maybe one should add this information also to a persistent changelog such as a NEWS.md file?

We bundled two completely different breaking PRs in 0.25.0 to avoid having too many breaking releases. I am not opposed to a 1.0 release (there's also an issue that was started almost two years ago: https://github.com/JuliaStats/Distributions.jl/issues/880) but I don't think it will solve the general problem of breaking releases. I anticipate that even after 1.0 (regardless of how many breaking changes one pushes into it), there will be breaking changes quite regularly - Distributions is just too big and there are too many possible improvements. Also IMO the two breaking changes in 0.25.0 are two major improvements (fixes many type inference problems with MixtureModel since now all fields are concretely typed and generalizes LocationScale to discrete distribution) and it is good to release them timely instead of waiting for other changes that are desired for 1.0 but not worked on yet.

DilumAluthge commented 3 years ago

I anticipate that even after 1.0 (regardless of how many breaking changes one pushes into it), there will be breaking changes quite regularly - Distributions is just too big and there are too many possible improvements.

In that case, perhaps it would be a good idea to break Distributions.jl into multiple smaller packages. Then people could only depend on the individual packages that they need.

@stefankarpinski Any thoughts?

StefanKarpinski commented 3 years ago

I dunno, but the breaking changes here are getting quite disruptive so it seems like something has to be done.

davidanthoff commented 3 years ago

So I also don't know the package well enough to have a super constructive suggestion, but it seems to me that the core basic functionality of sampling from a range of univariate distributions really should be in a stable package that doesn't have breaking updates anymore at this stage of Julia's development. Maybe more, not sure.

So if there is a way to move things that are expected to break more going forward into their own package, that might be one solution?

devmotion commented 3 years ago

it seems to me that the core basic functionality of sampling from a range of univariate distributions really should be in a stable package that doesn't have breaking updates anymore at this stage of Julia's development.

Actually, there might be breaking changes in this API in upcoming releases: https://github.com/JuliaStats/Distributions.jl/issues/1316

mschauer commented 3 years ago

it seems to me that the core basic functionality of sampling from a range of univariate distributions really should be in a stable package that doesn't have breaking updates anymore at this stage of Julia's development.

Yeah, this is not how it is. Getting Distributions.jl right is very difficult, and we have considerable pain from early design decisions (VariateForm, hardcoded Float64, cough cough...) when people still where experimenting with how to do things in Julia (this is a very old package, the license file says 2012...).

Also we bear some of the shortcomings of early Julia's design more strongly than others, e.g. an row iterator for arrays was missing for a long time, or missing support for vectors of vectors. On the other hand the requirements for such are package are constantly changing e.g. we do changes to accommodate needs of automatic differentiation https://github.com/JuliaStats/Distributions.jl/pull/1263 and too often people need roll their own design because the design of Distributions goes in the way. Finally, there is a huge amount of 1-dimensional distributions added (keeping up with R) so any change causes a lot of work so we cannot move fast.

That said, splitting it up and making it more modular would be great, but then that is another disruption and the opposite of freezing the design until 2.0.

StefanKarpinski commented 3 years ago

Would it make sense to have some kind of roadmap towards reaching 1.0 like DataFrames had? Doesn't have to be as in depth as that but it would be great to have something to get a sense of what has to be done.

azev77 commented 3 years ago

They have 1.0 milestones, which could be a place for a roadmap to 1.0. I think Miles did a great job w/ JuMP roadmap to 1.0.

Pramodh-G commented 3 years ago

1139 by @tpapp might also be relevant here

tpapp commented 3 years ago

@Pramodh-G: thanks for the ping. Having a lightweight DistributionsBase could mitigate this issue in two ways:

  1. It could be released as 1.0 immediately, as the generic API (type hierarchy, pdf, cdf, etc) has been stable for ages. Then packages which just need this API (eg because they accept distributions as arguments) would just depend on this.

  2. Experimentation with distributions could happen in third-party packages, with smaller packages providing special-case or niche use distributions, possibly not merged into this package even in the long run. This is already possible, but would become easier.