fonsp / Pluto.jl

🎈 Simple reactive notebooks for Julia
https://plutojl.org/
MIT License
5k stars 296 forks source link

DataFrames.CategoricalValue not displaying nicely #807

Open gszep opened 3 years ago

gszep commented 3 years ago

Screenshot from 2020-12-25 11-19-12

pankgeorg commented 3 years ago

Hey! Happy holidays! How would you like that to look? any suggestions?

PaulToronto commented 3 years ago

I don't like that display either and googling for a solution brought me here. The image below shows what I did for a solution, but I am wondering if there is a better way. I'd also like to get rid of the double quotes.

df
gszep commented 3 years ago

Happy holidays to you too! :smile: it would be nice to only show the string value without the type information and without quotations

rleyvasal commented 3 years ago

I agree the quotes should not be displayed on CategoricalValue, but the type should still be included in the output. The variable type is very useful specially when merging datasets with different data types.

Julia displays the type under the variable name when listing a table in Jupyter notebooks (This would be the best solution). JupyterCategoricaltype

By comparison, R Studio also displays the type under the variable name ( for factor and for integer) - without quotes DisplayCategoricalVariables

greimel commented 3 years ago

Here is a fix,

Base.show(io::IO, ::MIME"text/html", x::CategoricalArrays.CategoricalValue) = print(io, get(x))

before:

image

after:

image

Note, that this matches how CategoricalArrays are shown in the REPL.

image

I wonder if Pluto somehow interferes with the show method defined in CategoricalArrays?

cc @nalimilan

nalimilan commented 3 years ago

show(::CategoricalValue) gives exactly what is displayed above, i.e. CategoricalValue{Int64, UInt32} .... But this printing isn't supposed to be used when printing arrays or tables. DataFrames uses print to render the contents of its columns, and arrays use show(IOContext(io, :typeinfo=>eltype(column)), column[i]), which avoids repeating the type information. It would make sense for Pluto to print the eltype of the column under its name, and then pass typeinfo when printing like arrays.

fonsp commented 3 years ago

~I proposed a fix: https://github.com/JuliaData/CategoricalArrays.jl/pull/318 .~

fonsp commented 3 years ago

@nalimilan Any suggestions?

nalimilan commented 3 years ago

I made a detailed proposal above.

jerlich commented 3 years ago

Btw, I accidentally found a workaround. If you display a dataframe interpolated in markdown, it shows as in the repl

begin
    df =  DataFrame=(a = ["foo", "bar"], b=[1,2])
    df.a .= categorical(df.a)
   md"""
   $(df)
   """
end