JuliaStats / StatsBase.jl

Basic statistics for Julia
Other
584 stars 194 forks source link

RFC: Histograms with errors #137

Open jpata opened 9 years ago

jpata commented 9 years ago

I wrote a package [1] which derives from StatsBase and implements histograms with errors on the bin contents. Such histograms are heavily used for high-energy physics @ the Large Hadron Collider. Previously, this was discussed in https://github.com/JuliaStats/StatsBase.jl/issues/104.

In particular, we push values to the histogram with non-uniform weights and later model the bin counts using Poisson or Gaussian distributions.

Do you think this can be a part of StatsBase? If yes, I will prepare a PR soon. If not, I'll keep it as a separate package, but would like to put it to the julia metadata with a name that does not introduce confusion: looking for suggestions.

[1] https://github.com/jpata/Histograms.jl/blob/master/src/Histograms.jl

andreasnoack commented 9 years ago

I don't know this kind of histogram. Do you have an easy refererence for it? Preferably with a picture. My feeling is that StatsBase.jl might not be the right place for this. The package is meant for things that many other statistical packages would like to use and I'm not sure this functionality falls into that category.

jpata commented 9 years ago

Here's a paper where the idea is described: http://arxiv.org/pdf/0712.4250.pdf

Here's the Higgs boson discovery plot where this is used to draw the error bars and also infer the statistical uncertainty (not shown) on the background. image The error bars are derived from storing the squared weights (weight != 1 in push!(h::ErrorHistogram, values, weight)) in a separate N-dimensional matrix. This allows us to also do error propagation on the bins.

I see your point about this package being kind of a base dependency for all things Stats. Do you think it's worth it to put under JuliaStats? In this case, I'll see if I can have Histograms.jl merged to METADATA with some kind of a reasonable name. Physicists are now starting to use these packages and better to have them with proper testing in METADATA than in a private repo.

simonbyrne commented 9 years ago

I would lean toward making this a package for now. Maybe WeightedHistograms.jl?

andreasnoack commented 9 years ago

@jpata Placing a repo under an organization is mainly beneficial if the repo has several contributors. As long as it is mainly a your package, it is just as fine to keep it under your personal profile.

nilshg commented 1 year ago

This issue can probably be closed?