JuliaStats / StatsBase.jl

Basic statistics for Julia
Other
584 stars 194 forks source link

Design of the frequency/contingency tables #32

Closed lindahua closed 5 years ago

lindahua commented 10 years ago

Efficient implementation of such functions on generic data types (e.g. strings) can be done via pooled data arrays. Therefore, I feel that it makes sense to implement these in the DataArrays.jl, and thus take advantage of the pooled arrays stuff.

For the Stats.jl package, we will still maintain the counts function that can be used to compute such tables based on integer variables, like

counts([1, 1, 1, 2, 2, 2, 3, 3, 3, 3], 1:3)  # ==> [3, 3, 4]

We may also continue to the keep the countmap function (or whatever we finally decide to call it).

What I suggest is to implement two-way or multi-way contingency tables in DataArrays.jl


Please look below to see my detailed consideration about the interface design.

nalimilan commented 8 years ago

Have a look at my FreqTables.jl package mentioned above. I think this approach works and could be extended if needed.

papamarkou commented 8 years ago

Thanks, @nalimilan, yes, your FreqTables package seems useful. I will let you know in the next few days if I have any questions about its interface, as I will make use of it. Have there been any discussions about integrating your work into StatsBase given how fundamental cross-tabulation is for descriptive stats?

nalimilan commented 8 years ago

Have there been any discussions about integrating your work into StatsBase given how fundamental cross-tabulation is for descriptive stats?

Not that I know of. But first we would need to agree on the design of NamedArrays, which would become a dependency of StatsBase.

papamarkou commented 8 years ago

All right, I can use your package for now, as it sounds that making a decision about NamedArray may take a long time.

mkborregaard commented 8 years ago

Trying again in May 2016 - are basic contingency tables really not yet a part of base Julia?

nalimilan commented 8 years ago

@mkborregaard As I noted above, see https://github.com/nalimilan/FreqTables.jl.

mkborregaard commented 8 years ago

OK, thanks, I have downloaded it and it does what it should. Sorry for sounding annoyed, I was just looking desperately for this functionality while stuck in a plane for 9 hours :-)

Nosferican commented 5 years ago

Should this be closed?

nalimilan commented 5 years ago

I guess so. It's too bad we don't have freqtable in a fundamental package like StatsBase or Statistics (due to the dependency on NamedArrays), but we could add FreqTables to StatsKit.