JuliaStats / StatsKit.jl

Convenience meta-package to load essential packages for statistics
Other
139 stars 16 forks source link

Rename package #5

Closed andreasnoack closed 5 years ago

andreasnoack commented 5 years ago

As @ararslan pointed out recently, having Statistics, StatsBase, and Stats is going to be confusing. Maybe it would help a little bit to call this package for something like StatisticalModeling or StatisticalModels.

ararslan commented 5 years ago

That name would then conflict with StatsModels...

andreasnoack commented 5 years ago

That is true. Any other ideas?

ararslan commented 5 years ago

Not really offhand. In keeping with that theme, StatisticalAnalysis would work I guess, but I don't love it.

kleinschmidt commented 5 years ago

StatsBats? StatisticalBatteries? StatAnal? StatsMeta?

kleinschmidt commented 5 years ago

We could also rename StatsModels, since that's more of a package for package authors. Maybe Formulae?

nalimilan commented 5 years ago

I'm not opposed to renaming StatsModels, but I think it's quite orthogonal to the present issue. StatisticalModels wouldn't really be a good name for this package since it covers much more than that.

The risk of confusion is rather with Statistics vs. StatsBase vs. Stats. Ideally we'll be able to get rid of StatsBase. That leaves Statistics vs. Stats. I'm not a fan of the proposed names so far, but I can't find a better one. StatisticsExtras? Statistics++? :-)

kleinschmidt commented 5 years ago

I think I'd prefer StatisticsBatteries over StatisticsExtras since it's still basic functionality (the "batteries" you need for a "batteries included" statistics experience).

ararslan commented 5 years ago

"Extras" seems like the wrong word to me, and "batteries" I think is a little too cute. (While "batteries included" is a common expression in the US, I wouldn't be surprised if many people coming to Julia hadn't heard it.)

andreasnoack commented 5 years ago

I like Formulae regardless of the conclusion here.

Using the Meta postfix could possibly become a general way of indicating that a package is a meta package and might not contain much source code.

nalimilan commented 5 years ago

As a non-native speaker, I agree "batteries" would sound weird to many people. I'd say it's more standard to talk about "environments".

I like Formulae regardless of the conclusion here.

Could you explain why? I don't understand why you think formulas describe what this package does (AFAICT only StatsModels uses formulas among the re-exported packages).

Using the Meta postfix could possibly become a general way of indicating that a package is a meta package and might not contain much source code.

Yeah, Meta isn't too bad. StatisticsMeta?

andreasnoack commented 5 years ago

@nalimilan Should we wait with a release until we have resolved this?

nalimilan commented 5 years ago

Yes, I had just realized it would be silly to tag a release if we want to change the name.

andreasnoack commented 5 years ago

Could you explain why?

Formulae was not suggested as the new for this package but as a new name for the current StatsModels.

nalimilan commented 5 years ago

Another package to take into account is StatPlots, which is the graphical complement to this package. I think we should use the same prefix for all packages, be it Statistics (as for the stdlib), Stat or Stats (currently it's very inconsistent).

ararslan commented 5 years ago

+1 for meta and for spelling out statistics explicitly. Formulae is a fine name for StatsModels but personally I find it a bit too general and would prefer something like StatisticalModels if we're going to spell things out.

nalimilan commented 5 years ago

@piever @mkborregaard What are your opinions about using a consistent prefix for the statistics packages?

piever commented 5 years ago

I was thinking about this recently (trying to decide a name for the Makie equivalent of StatPlots).

  1. My first impression is that the distinction between StatsBase and Statistics is not very clear and it'd be nicer if it were just one package. I'm not sure whether there are technical reasons to keep them separate.

  2. I also like the Statistical prefix better than Stats (StatisticalFunctions is IMO much better than StatsFuns).

So overall I think the following could be an option:

StatsBase, Statistics => Statistics
StatsModels => StatisticalModels 
StatsFuns => StatisticalFunctions
StatPlots => StatisticalPlots
StatMakie => StatisticalMakie? MakieStatistics?
  1. Concerning the new name here, I'm curious whether JuliaDB can be an example: could one consider JuliaStatistics? It would stand for "tools to do statistical things in Julia", that is to say, this package should provide you with all the statistical functionality that other packages give you in other languages (like say SciPy that also has the programming language in the name). Alternatively, StatsticsMeta is also a fine name.
mkborregaard commented 5 years ago

So, I'm very much in favour of keeping a consistent naming interface, such as the informal conventions to signal higher-level package organisation with a FooBase package defining abstract interfaces and shared types. I guess there is precedence for FooMeta to signal end-user-facing batteries-included packages that you just load and get the functionality you want right at the REPL. Note though that there is a potential for ambiguity, in that DataFramesMeta and JuliaDBMeta mean "metaprogramming constructs for working with DataFrames/JuliaDB".

Another convention could be to use the name of the organization, e.g. JuliaStats or just, as it is now, Stats with the Julia being implicit. Meaning "this package gives you the core packages in the organisation". That is the approach taken by Plots and almost DifferentialEquations.

That would argue against changing from Stats to Statistical, or at least then the organisation might change name as well. I also think, as @piever 's example shows, that though Statistical is very good when packages follow the convention of being named after the plural of the core type, it is not so good in other cases. I actually really like the names of the packages in this org, clear, descriptive and consistent. (Think about "StatsPlots", "PlotRecipes" and "RecipesBase" as a good example of an inconsistent naming scheme that always causes confusion). StatisticalGLM is a bit horrible too.

In summary, I think the best solution would be to call this package Stats, merge StatsBase with Statistics, and name that stdlib "StatsBase". That would be clear for everyone, and anyway "Statistics" is too big a name for the stdlib IMHO.

mkborregaard commented 5 years ago

Oh just two last things: 1) Actually the best name for the stdlib IMO is "BasicStatistics", I think most new users to Julia would find it strange to do using Statistics and not even have a t-test/linear model. @andreasnoack is it unheard of to suggest renaming stdlibs (will it be considered breaking)?

And 2) on the other hand I would support renaming StatsPlots to StatisticalPlots if that is what you decide on here more generally.

andreasnoack commented 5 years ago

I'm not sure if anybody currently knows the consequences of renaming a stdlib but I think it should be possible at some point. It's my impression that they, eventually, should mainly be identified by their uuid such that renaming should be possible but we are not there yet.

quinnj commented 5 years ago

using Stats just seems too nice to give up as the batteries-included package for high-quality stats functionality. I agree that StatsBase & Statistics could be merged, with the basic stuff moving to Statistics and the "other" stuff being farmed out to other stats packages (or a new StatsUtils.jl package). I actually don't mind the Stats vs. Statistics distinction; I think in general, users will see/hear/know about Stats.jl and maybe sometimes hear about/become aware of the Statistics stdlib. At that point, it's really easy to explain the difference: there are core statistics functions that live in the stdlib, while Stats includes a much broader set of statistical functions.

nalimilan commented 5 years ago

Should we keep Stats.jl then?

@piever @mkborregaard In that case, would you be open to renaming StatPlots to StatsPlots for consistency?

Cc: @StefanKarpinski

mkborregaard commented 5 years ago

Absolutely

mkborregaard commented 5 years ago

A lot of users will not appreciate that though :-O

piever commented 5 years ago

I'm also OK with renaming StatPlots and StatMakie to StatsPlots and StatsMakie, as long as we are sure that the decision is final.

brenhinkeller commented 5 years ago

A vote for keep it. This package isn't the problem, and as a meta-package is kinda helping. The significant naming problem IMO is StatsBase vs. Statistics -- there's no need for both of them to exist, and for Statistics to be more "basic" than StatsBase is just confusing.

JeffBezanson commented 5 years ago

How about AllStats?

nalimilan commented 5 years ago

Yeah, I had the same idea today. Why not. The main drawback is that we would lose the common Stats suffix used by other packages, and therefore autocompletion

JeffBezanson commented 5 years ago

StatsAll also seems ok.

brenhinkeller commented 5 years ago

Would this include things like SpecialFunctions.jl? If not, there might be a case for an even broader metapackage to give us a coherent basic scientific/technical computing environment

ararslan commented 5 years ago

My only problem with using "all" in the name is that it necessarily will not include all stats packages in existence, and there will undoubtedly be questions like "this is called all but why isn't package X included?"

StefanKarpinski commented 5 years ago

Maybe StatsLab or StatsKit?

mkborregaard commented 5 years ago

It'd be nice to have/follow a naming convention for packages such as this, DifferentialEquations.jl, Learn.jl etc.

StefanKarpinski commented 5 years ago

Convention conshmention.

ararslan commented 5 years ago

Maybe StatsLab

Perhaps the matrix packages in JuliaMatrices should be bundled together as MatLab. :trollface:

mkborregaard commented 5 years ago

fun-01

brenhinkeller commented 5 years ago

If as @ararslan suggested over in https://github.com/JuliaLang/julia/issues/29751#issuecomment-431642437

StatsFuns should be merged into Distributions (see JuliaStats/StatsFuns.jl#20), and the functionality in StatsBase should be redistributed across Statistics, Random, StatsModels, Distributions, and Distances. Currently StatsBase is kind of a grab bag of abstractions that belong in StatsModels and random functionality that doesn't have another home. I expect that once we can develop and version stdlibs separately from Julia itself, there will be more motivation to consolidate.

then StatsBase and StatsFuns would be out of the picture, the namespace confusion wouldn't be nearly as bad, and Stats.jl could stay as a perfectly cromulent (and short, easy to type) metapackage name

musm commented 5 years ago

Since no one has mentioned it yet, i'll throw out StatisticalTools; this in combination with

then StatsBase and StatsFuns would be out of the picture, the namespace confusion wouldn't be nearly as bad, and Stats.jl could stay as a perfectly cromulent (and short, easy to type) metapackage name

would indeed clarify the namespace confusion.

Using Stats for a meta package seems a bit strange since the name is shorter than Statistics (psychologically, this is what my brain tells me when looking at the names)

nalimilan commented 5 years ago

"Tools" really doesn't evoke the main package for statistics to me: it rather sounds like small utilities which can be convenient, but not essential.

nalimilan commented 5 years ago

After discussing this a lot, we've decided to go with StatsKit. See #15.

mkborregaard commented 5 years ago

I like that.