JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
437 stars 88 forks source link

group= plots don't work with DataFrames master #33

Closed tpapp closed 7 years ago

tpapp commented 7 years ago
using Plots
using StatPlots
using DataFrames
N = 100
data = DataFrame(a = rand(1:20, N), b = rand(1:20, N), c = rand(1:3, N))
scatter(data, :a, :b, group=:c)

gives

ERROR: LoadError: TypeError: typeassert: expected AbstractArray{Bool,N}, got Array{Nullable{Bool},1}
 in copy!(::Array{Array{Int64,1},1}, ::Base.Generator{Array{Nullable{Int64},1},Plots.##129#131{Array{Nullable{Int64},1}}}) at ./abstractarray.jl:477
 in _collect(::Type{Array{Int64,1}}, ::Base.Generator{Array{Nullable{Int64},1},Plots.##129#131{Array{Nullable{Int64},1}}}, ::Base.HasShape) at ./array.jl:251
 in extractGroupArgs(::Array{Nullable{Int64},1}) at /home/tamas/.julia/v0.5/Plots/src/args.jl:765
 in extractGroupArgs(::Symbol, ::DataFrames.DataFrame, ::Symbol, ::Vararg{Symbol,N}) at /home/tamas/.julia/v0.5/StatPlots/src/dataframes.jl:21
 in _preprocess_args(::Dict{Symbol,Any}, ::Tuple{DataFrames.DataFrame,Symbol,Symbol}, ::Array{RecipesBase.RecipeData,1}) at /home/tamas/.julia/v0.5/Plots/src/pipeline.jl:29
 in _process_userrecipes(::Plots.Plot{Plots.PyPlotBackend}, ::Dict{Symbol,Any}, ::Tuple{DataFrames.DataFrame,Symbol,Symbol}) at /home/tamas/.julia/v0.5/Plots/src/pipeline.jl:60
 in _plot!(::Plots.Plot{Plots.PyPlotBackend}, ::Dict{Symbol,Any}, ::Tuple{DataFrames.DataFrame,Symbol,Symbol}) at /home/tamas/.julia/v0.5/Plots/src/plot.jl:171
 in #plot#261(::Array{Any,1}, ::Function, ::DataFrames.DataFrame, ::Vararg{Any,N}) at /home/tamas/.julia/v0.5/Plots/src/plot.jl:52
 in (::Plots.#kw##plot)(::Array{Any,1}, ::Plots.#plot, ::DataFrames.DataFrame, ::Symbol, ::Vararg{Symbol,N}) at ./<missing>:0
 in #scatter#354(::Array{Any,1}, ::Function, ::DataFrames.DataFrame, ::Vararg{Any,N}) at /home/tamas/.julia/v0.5/Plots/src/Plots.jl:139
 in (::Plots.#kw##scatter)(::Array{Any,1}, ::Plots.#scatter, ::DataFrames.DataFrame, ::Vararg{Any,N}) at ./<missing>:0
 in include_from_node1(::String) at ./loading.jl:488
while loading /tmp/foo.jl, in expression starting on line 6

My Pkg.status() is

53 required packages:
 - Atom                          0.5.8
 - BenchmarkTools                0.0.6
 - CSV                           0.1.2+             master
 - Cairo                         0.2.35
 - Colors                        0.6.9
 - DataFrames                    0.8.3+             master
 - DataFramesMeta                0.1.3
 - Distributions                 0.11.1
 - Documenter                    0.8.0+             master
 - DualNumbers                   0.2.3
 - Fontconfig                    0.1.1
 - ForwardDiff                   0.3.3+             master
 - GLM                           0.6.1
 - GR                            0.18.0
 - GZip                          0.2.20
 - Gadfly                        0.5.2
 - Gallium                       0.0.4
 - IntervalSets                  0.0.2+             master
 - JuliaParser                   0.7.4
 - Klara                         0.7.1+             master
 - Lexicon                       0.1.18
 - Libz                          0.2.2
 - MacroTools                    0.3.4
 - Mamba                         0.10.0
 - Match                         0.3.0
 - MultivariateStats             0.3.1
 - NLopt                         0.3.3
 - NamedArrays                   0.5.3
 - NamedTuples                   1.0.0
 - ObjFileBase                   0.0.4
 - OffsetArrays                  0.2.12
 - Optim                         0.7.4+             master
 - PGFPlots                      1.4.3
 - Parameters                    0.6.0
 - PkgDev                        0.1.3
 - Plotly                        0.1.1
 - Plots                         0.10.3+            master
 - Primes                        0.1.2
 - ProfileView                   0.1.5
 - ProgressMeter                 0.3.3+             progress-count
 - PyPlot                        2.2.4
 - QuantEcon                     0.8.0
 - Query                         0.3.0
 - RDatasets                     0.2.0              master
 - ReverseDiff                   0.0.2
 - Rsvg                          0.0.2
 - Showoff                       0.0.7
 - Sobol                         0.2.0
 - StatPlots                     0.2.1+             master
 - StatsBase                     0.12.0
 - StatsFuns                     0.3.1+             master
 - SuiteSparse                   0.0.1
 - UnicodePlots                  0.2.2
126 additional packages:
 - AMDB                          0.0.0-             master (unregistered, dirty)
 - ASTInterpreter                0.0.4
 - AbstractTrees                 0.0.4
 - ArgParse                      0.4.0
 - AutoAligns                    0.0.0-             master (unregistered, dirty)
 - AutoHashEquals                0.0.10
 - AxisAlgorithms                0.1.5
 - BaseTestNext                  0.2.2
 - BinDeps                       0.4.5
 - Blink                         0.5.0
 - Blosc                         0.1.7
 - BufferedStreams               0.2.3
 - COFF                          0.0.2
 - CRC                           1.2.0
 - Calculus                      0.1.15
 - CategoricalArrays             0.1.0
 - CodeTools                     0.4.3
 - Codecs                        0.2.0
 - ColorBrewer                   0.3.0
 - ColorTypes                    0.2.12
 - ColorVectorSpace              0.1.12
 - Combinatorics                 0.3.2
 - Compat                        0.12.0
 - Compose                       0.4.4
 - Conda                         0.4.0
 - ContinuousTransformations     0.0.0-             master (unregistered)
 - Contour                       0.2.0
 - DSP                           0.1.1
 - DWARF                         0.1.0
 - DataArrays                    0.3.11
 - DataStreams                   0.1.2
 - DataStructures                0.5.1
 - Dates                         0.4.4
 - DebuggingUtilities            0.0.0-             master (unregistered)
 - DiffBase                      0.0.2
 - Discretizers                  0.3.1
 - Distances                     0.3.2
 - DocStringExtensions           0.3.1
 - Docile                        0.5.23
 - ELF                           0.1.0
 - FileIO                        0.2.1
 - FixedPointNumbers             0.2.1
 - FixedSizeArrays               0.2.5
 - Formatting                    0.2.0
 - FunctionWrappers              0.0.1
 - Glob                          1.1.0
 - Graphics                      0.1.3
 - Graphs                        0.7.1
 - Gtk                           0.10.4
 - GtkUtilities                  0.1.0
 - HDF5                          0.7.2
 - Hexagons                      0.0.4
 - Hiccup                        0.1.1
 - HiddenMarkovChains            0.0.0-             master (unregistered)
 - HttpCommon                    0.2.6
 - HttpParser                    0.2.0
 - HttpServer                    0.1.7
 - ImageMagick                   0.1.8
 - ImageView                     0.2.0
 - Images                        0.5.14
 - IndirectInference             0.0.0-             master (unregistered, dirty)
 - IniFile                       0.2.5
 - Interpolations                0.3.6
 - Iterators                     0.2.0
 - JLD                           0.6.8
 - JSON                          0.8.1
 - Juno                          0.2.5
 - KernelDensity                 0.3.0
 - KeyTuples                     0.0.0-             master (unregistered, dirty)
 - LMFlows                       0.0.0-             master (unregistered, dirty)
 - LNR                           0.0.2
 - LaTeXStrings                  0.2.0
 - Lazy                          0.11.5
 - LegacyStrings                 0.2.0
 - LightGraphs                   0.7.2
 - LightXML                      0.4.0
 - LineSearches                  0.1.4
 - Loess                         0.1.0
 - MachO                         0.0.4
 - MathProgBase                  0.5.10
 - MbedTLS                       0.4.2
 - Measures                      0.0.3
 - Media                         0.2.4
 - Mustache                      0.1.3
 - Mux                           0.2.2
 - NaNMath                       0.2.2
 - NativeExpm                    0.0.0-             master (unregistered, dirty)
 - NullableArrays                0.0.10
 - PDMats                        0.5.3
 - ParserCombinator              1.7.11
 - PlotThemes                    0.1.0
 - PlotUtils                     0.3.0
 - PlotlyJS                      0.5.2
 - Polynomials                   0.1.2
 - PositiveFactorizations        0.0.3
 - PosteriorAnalysis             0.0.0-             master (unregistered)
 - PyCall                        1.8.0
 - RData                         0.0.4
 - Ratios                        0.0.4
 - Reactive                      0.3.6
 - RecipesBase                   0.1.0
 - Reel                          0.2.1
 - Reexport                      0.0.3
 - Requests                      0.3.12
 - Requires                      0.3.0
 - ReverseDiffSource             0.3.0
 - Rmath                         0.1.6
 - SHA                           0.3.0
 - SIUnits                       0.1.0
 - SortingAlgorithms             0.1.0
 - StructIO                      0.0.2
 - TerminalUI                    0.0.2
 - TexExtensions                 0.0.3
 - TextDataParsing               0.0.0-             master (unregistered)
 - TextTableRows                 0.0.0-             master (unregistered, dirty)
 - TextWrap                      0.1.6
 - TikzPictures                  0.3.5
 - Tk                            0.4.0
 - URIParser                     0.1.7
 - VT100                         0.0.2
 - VideoIO                       0.1.0
 - WeakRefStrings                0.2.0
 - WebSockets                    0.2.1
 - Winston                       0.12.1
 - WoodburyMatrices              0.2.1
 - Zlib                          0.1.12
mkborregaard commented 7 years ago

Thanks, this is a bug, probably related to the reorg of DataFrames. Will look into it.

mkborregaard commented 7 years ago

The plan is to make a clean break, so the post-"Nullable DataFrames" release will support it.

mkborregaard commented 7 years ago

FYI https://github.com/JuliaPlots/StatPlots.jl/pull/34

mkborregaard commented 7 years ago

@piever do you think the toArray function in the merged PR #34 could be used to make groupapply compatible with the new DataFrames (on the Nullable_DataFrames branch, not master)?

piever commented 7 years ago

@mkborregaard Thanks for providing the functionality. I'm checking right now but it seems that groupapply doesn't work with DataFrames master so the issue arises before trying to plot the outcome. I think my solution will be change groupapply so that at the beginning it either discards or gives another category to data which is missing relevant entries (according to my comment in #34 ), then does the statistical analysis on a dataframe of normal arrays and then outputs normal arrays which could be plotted as usual. To actually have the thing working on NullableArrays would take the statistical ecosystem to also update (for example, ecdf(NullableArray(rand(100)))) gives a method error as the ecdf is not implemented for NullableArrays) so I'd rather post-pone that.

EDIT = seems like it's working (but it relies on the special constructor for DataFrames). I'll do a bit more testing and add a PR to the Nullable_DataFrames branch I guess?

mkborregaard commented 7 years ago

Thanks! After looking at it, I am not sure the code requires changing? Where does it rely on the special constructor? I am sure the rest of the statistical ecosystem will follow in time :-)

mkborregaard commented 7 years ago

I am leaving this issue open for others who come looking, but this issue has been solved on the Nullable_Dataframes branch, which will be merged into master once the big DataFrames change arrives.

mkborregaard commented 7 years ago

An update - they have moved all Nullable functionality to DataTables. So the Nullable_DataFrames branch should be changed to work with DataTables rather than DataFrames, and then it can be merged into master.

piever commented 7 years ago

Nullables are no longer used in DataFrames, but even in the data structures where they are (i.e. IndexedTables), the @df macro (see #82) takes care of them, so I guess this can be safely closed.