MakieOrg / AlgebraOfGraphics.jl

An algebraic spin on grammar-of-graphics data visualization in Julia. Powered by the Makie.jl plotting ecosystem.
https://aog.makie.org
MIT License
443 stars 45 forks source link

Confidence Interval Band #366

Open kir0ul opened 2 years ago

kir0ul commented 2 years ago

Problem description

I would like to be able to plot a line with a confidence interval band, something like the plot below (which I believe is not possible at the moment using only AlgebraOfGraphics.jl alone).

image

Proposed solution

I guess the implementation could use the band() function from Makie, and borrow some ideas for the API from VegaLite.jl or ggplot2.

greimel commented 2 years ago

linefill() does that http://juliaplots.org/AlgebraOfGraphics.jl/stable/API/recipes/#AlgebraOfGraphics.linesfill

Unfortunately it is not covered in the docs yet.

kir0ul commented 2 years ago

Thanks! I've been able to use it on a single line plot but I'm struggling to use it on a plot with multiple lines and keep the same colors. Would you have any hint or example on how to use linesfill() with multiple line using AlgebraOfGraphics syntax? Or maybe I should switch to plain Makie syntax?

I need to do something like the figure below. image

And for now my code is looking like the following:

using DataFrames
using Random
using HypothesisTests
using Statistics
using AlgebraOfGraphics, CairoMakie
set_aog_theme!()

param_categories = [1, 2, 3, 4]

function get_data()
    N = 100
    n_runs = 10
    f(x, p) = exp.(x .+ p) ./ sum(exp.(x .+ p)) .+ rand(1) # some example function
    df_full = DataFrame(x = Int[], y = Float64[], param = Int[], run_nb = Int[])
    for (ip, param) in enumerate(param_categories)
        for idx = 1:n_runs
            x = collect(1:N)
            y = f(x, param)
            params = fill(param_categories[ip], N)
            run_nb = fill(idx, N)
            for jdx = 1:N
                push!(df_full, (x[jdx], y[jdx], params[jdx], run_nb[jdx]))
            end
        end
    end

    df_avg = DataFrame(x = Int[], y_avg = Float64[], param = Int[])
    lower, upper = zeros(length(param_categories), N), zeros(length(param_categories), N)
    for (ip, param) in enumerate(param_categories)
        y = zeros(N)
        for jdx = 1:N
            y[jdx] = mean(df_full.y[df_full.x.==jdx.&&df_full.param.==param, :])
            lower[ip, jdx], upper[ip, jdx] = confint(
                OneSampleTTest(vec(df_full.y[df_full.x.==jdx.&&df_full.param.==param, :])),
            )
        end
        x = collect(1:N)
        params = fill(param_categories[ip], N)
        for jdx = 1:N
            push!(df_avg, (x[jdx], y[jdx], params[jdx]))
        end
    end
    return df_full, df_avg, lower, upper
end

function plotting()
    df_full, df_avg, lower, upper = get_data()
    plt =
        data(df_avg) *
        mapping(:x, :y_avg => "y", color = :param => nonnumeric) *
        visual(Lines)
    fg = draw(plt)

    # Confidence interval band
    for (ip, param) in enumerate(param_categories)
        linesfill!(
            vec(df_avg.x[df_avg.param.==param, :]),
            vec(df_avg.y_avg[df_avg.param.==param, :]);
            lower = lower[ip, :],
            upper = upper[ip, :],
        )
    end
    return fg
end

fn = "My-awesome-figure.png"
save("$fn", plotting())
cmd = `xdg-open "$fn"`
run(cmd)
greimel commented 2 years ago

Can you add some fake data to make your example runnable? Then it will be easier to help.

kir0ul commented 2 years ago

Can you add some fake data to make your example runnable? Then it will be easier to help.

Yes sorry, I updated the code above which you should now be able to run fully.

Nosferican commented 2 years ago

I am not sure of the difference between linesfill and band but would was able to get what I needed in Makie using band. Would be amazing to have that feature available in AoG. It's a killer feature in VegaLite and I use a lot so would love to have that available in Makie/AoG as well. This is an example,

using DataFrames, CairoMakie, AlgebraOfGraphics

df = DataFrame(x = 0:5, y = 0:5, lb = .√(0:5), ub = (0:5).^2)

f = Figure()
Axis(f[1, 1])

band!(
    df[!,:x],
    df[!,:lb],
    df[!,:ub],
    color = RGBAf(0.1, 0.1, 0.1, 0.05))
lines!(
    df[!,:x],
    df[!,:y],
    color = "black")
f

plot_51

greimel commented 2 years ago

@Nosferican, @kir0ul here's how you would do it with AoG:

using AlgebraOfGraphics, CairoMakie
using DataFrames: DataFrame
using Chain: @chain
using DataFrameMacros: @transform

df = @chain begin
    DataFrame(x = [0:5; 0:5], sample = [fill(1, 6); fill(2, 6)])
    @transform(:y = :x + :sample)
    @transform(:lb = √(:y), :ub = :y^2)
end

data(df) * visual(LinesFill) * mapping(
    :x, :y, 
    lower = :lb, upper = :ub,
    color = :sample => nonnumeric,
    layout = :sample => nonnumeric
) |> draw
image

(@Nosferican, thanks for providing a more minimal example.)

Nosferican commented 2 years ago

It may make sense to allow for a method w/o having to specify the y and just the lower and upper.

greimel commented 2 years ago

That would be visual(Band).

ParadaCarleton commented 2 years ago

@Nosferican, @kir0ul here's how you would do it with AoG:

Is there a tutorial on how to do this with multiple bands? Looking to get a fan chart, e.g. like this: image

greimel commented 2 years ago

I wanted to implement this at some point but I didn't finish. I've just put the code into a draft PR https://github.com/greimel/AoGExtensions.jl/pull/16 so that you have a look.

Here's an example that works already.

image
[Data] ```julia raw_df = mapreduce(vcat, 1:500) do i T = 100 drift = rand([-0.2, 0, 0.2]) x = 1:T y = [randn()] for t ∈ 1:T-1 push!(y, drift + y[end] + randn()) end DataFrame(; x, y, i, drift) end ```

What I still need to add is customization of breakpoints (these are currently hardcoded)

ParadaCarleton commented 2 years ago

This is great! I actually managed to get my own version almost working:

fan_chart = visual(LinesFill; color=:red) * (
    mapping(house_pop...; lower="5.0%", upper="95.0%") +
    mapping(house_pop...; lower="12.0%", upper="88.0%") +
    mapping(house_pop...; lower="27.0%", upper="73.0%")
) + visual(Lines; color=:black) * mapping(house_pop...)

The reason I say "Almost" is because the red lines (not the band) are visible from behind the black line. I'd also like to add a legend saying which interval is which. I can't get Band to work; it seems to use a different syntax from LinesFill.

ParadaCarleton commented 2 years ago

Ahh, I figured it out!

fan_chart = visual(Band; color=(:red, .3)) * (
    mapping(:year, "5.0%", "95.0%") +
    mapping(:year, "12.0%", "88.0%") +
    mapping(:year, "27.0%", "73.0%")
) + visual(Lines) * mapping(:year, "50.0%")
ParadaCarleton commented 2 years ago

I wanted to implement this at some point but I didn't finish. I've just put the code into a draft PR greimel/AoGExtensions.jl#16 so that you have a look.

Here's an example that works already.

[Data] What I still need to add is customization of breakpoints (these are currently hardcoded)

Actually, do you know how I'd add a legend to this that tells users what each band means? (e.g. light yellow=68%, dark yellow = 95%, very dark = 99%)

ParadaCarleton commented 2 years ago

Also, would a continuous fan chart like this one be possible? image

ParadaCarleton commented 2 years ago

Opened a new issue for fan charts here.

greimel commented 2 years ago

Actually, do you know how I'd add a legend to this that tells users what each band means? (e.g. light yellow=68%, dark yellow = 95%, very dark = 99%)

The goal is to make this syntax work (note the alpha keyword).

visual(Band) * mapping(:year, :low, :high, alpha = :quantile)

At some point I was working a PR to achieve that, but I didn't finish it. I hope to be able to get back to this at some point.