JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
437 stars 88 forks source link

Support missing variables when plotting boxplot #471

Open JinraeKim opened 2 years ago

JinraeKim commented 2 years ago

This post is from a post of JuliaDiscourse.

I found that boxplot does not support missing values.

For example,

using StatsPlots

labels = ["hi", "hello"]
data = [1.0 missing 3.0; 4.0 5.0 6.0]
boxplot(labels, data)

results in the following error:

julia> boxplot(labels, data)
ERROR: ArgumentError: quantiles are undefined in presence of NaNs or missing values
Stacktrace:
  [1] _quantilesort!(v::Vector{Union{Missing, Float64}}, sorted::Bool, minp::Float64, maxp::Float64)
    @ Statistics /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Statistics/src/Statistics.jl:980
  [2] #quantile!#49
    @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Statistics/src/Statistics.jl:957 [inlined]
  [3] quantile(v::Vector{Union{Missing, Float64}}, p::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}; sorted::Bool, alpha::Float64, beta::Float64)
    @ Statistics /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Statistics/src/Statistics.jl:1073
  [4] quantile(v::Vector{Union{Missing, Float64}}, p::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64})
    @ Statistics /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Statistics/src/Statistics.jl:1073
  [5] macro expansion
    @ ~/.julia/packages/StatsPlots/dFNaF/src/boxplot.jl:41 [inlined]
  [6] apply_recipe(plotattributes::AbstractDict{Symbol, Any}, #unused#::Type{Val{:boxplot}}, x::Any, y::Any, z::Any)
    @ StatsPlots ~/.julia/packages/RecipesBase/qpxEX/src/RecipesBase.jl:289
  [7] _process_seriesrecipe(plt::Any, plotattributes::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/Bxu2O/src/series_recipe.jl:50
  [8] _process_seriesrecipes!(plt::Any, kw_list::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/Bxu2O/src/series_recipe.jl:27
  [9] recipe_pipeline!(plt::Any, plotattributes::Any, args::Any)
    @ RecipesPipeline ~/.julia/packages/RecipesPipeline/Bxu2O/src/RecipesPipeline.jl:97
 [10] _plot!(plt::Plots.Plot, plotattributes::Any, args::Any)
    @ Plots ~/.julia/packages/Plots/qbc7U/src/plot.jl:208
 [11] plot(::Any, ::Vararg{Any}; kw::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
    @ Plots ~/.julia/packages/Plots/qbc7U/src/plot.jl:91
 [12] boxplot(::Any, ::Vararg{Any}; kw::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
    @ Plots ~/.julia/packages/RecipesBase/qpxEX/src/RecipesBase.jl:410
 [13] boxplot(::Any, ::Vararg{Any})
    @ Plots ~/.julia/packages/RecipesBase/qpxEX/src/RecipesBase.jl:410
 [14] top-level scope
    @ REPL[18]:1

To avoid this issue, one may be able to repeat boxplot! as

julia> fig = plot()

julia> for (i, label) in enumerate(labels)
           boxplot!(fig, [label], hcat((skipmissing(data[i, :]) |> collect)...))
       end

julia> display(fig)

Why don't you provide this functionality for boxplot?

sethaxen commented 2 years ago

I don't know if any plotting functions in Plots or StatsPlots support missing. Ideally we would, but it takes a lot more special-casing to do it correctly. On the other hand, it's pretty easy for users to drop missing values of they know they have them.

I wonder though if this is something we want to consider doing.

mkborregaard commented 2 years ago

Yes, Plots supports missing values out-of-the-box, but by converting them to NaNs. Given that boxplots don't support NaNs we get this issue. It would make sense to filter(isfinite, data) here after the grouping in boxplot (and violin) IMHO.