StatisticalRethinkingJulia / StatisticalRethinking.jl

Julia package with selected functions in the R package `rethinking`. Used in the SR2... projects.
MIT License
386 stars 32 forks source link

Add Unicode Histogram functionality #93

Closed karajan9 closed 4 years ago

karajan9 commented 4 years ago

Example usage:

d = CSV.read(datadir("exp_raw/Howell_1.csv"), copycols = true)
for col in eachcol(d)
    println(unicode_histogram(col))
end

gives

▁▁▁▂▂▂▂▂▂██▆▁
▁▃▄▄▃▂▃▆██▅▃▁
█▆▆▆▆▃▃▁▁
█▁▁▁▁▁▁▁▁▁█

I'm not sure how to incorporate this with precis since it would probably turn the array type in to a Union. Unicode and font coverage is also necessary but that seems to be pretty good in the Julia community (heavy Unicode symbol usage, strings are UTF-8...).

karajan9 commented 4 years ago

To show where the histogram is coming from since almost everything is missing from the Unicode one:

f = fit(Histogram, d.weight, nbins = 12)
plot(f)
hline!(range(0, maximum(f.weights), length = 9))

histogram turns into ▁▃▄▄▃▂▃▆██▅▃▁

I'm not quite happy with how the Unicode is displayed in my browser, apparently that font isn't as nice, so results my vary.

goedman commented 4 years ago

This looks great! Definitely would be great if we can indeed integrate into precis(), I'll have a look.

Precis in my mind is kind of a "quick peek" at the results, maybe quality of display is slightly less important. In many of the examples in the book I didn't always find the histograms convincing.

When experimenting with UnicodePlots I did run into the issue of dispatch conflicts with StatsPlots. We will have to decide if we always want users to fully qualify method names. I might have to clean that up for plotcoef, plotbounds, etc. when Unicode histograms become part of precis.

karajan9 commented 4 years ago

maybe quality of display is slightly less important

Right, I see it as only differentiating between maybe uniform, symmetric, asymmetric, (categorical?), and "weird". Fine for the very first overview but for anything else something else is needed.

When experimenting with UnicodePlots

I'm not sure if we are misunderstanding each other here, the histogram uses Unicode symbols to show the bars but isn't built on UnicodePlots. The method as it is right now shouldn't cause any conflicts (maybe with fit? But since that's not in user code it's easy for us to fix to StatsBase.fit).

goedman commented 4 years ago

Aah, I see, very neat! Will try to incorporate it in the display of precis().