JuliaData / DataFrames.jl

In-memory tabular data in Julia
https://dataframes.juliadata.org/stable/
Other
1.74k stars 367 forks source link

Provide entry point for color customization #2831

Open Arkoniak opened 3 years ago

Arkoniak commented 3 years ago

DataFrames.jl uses hardcoded colors for generating output and unfortunately color choice is not always suitable for all users for various reasons.

I propose to use the following approach instead of colors hardcoding:

# define somewhere at the beginning

const COLOR = Ref((; foo = :red, bar = :green))

# and later in the code, when color is needed
printstyled("what you need to print", COLOR[].foo)

This simple change does not provide full-fledged color theme switching by itself (this probably should be implemented JuliaLang/julia#41435), but it provides an entry point for all future color theme related stuff. I.e. all user level convenience functions would just modify this variable one way or another. And anyway this proposal allows color customization right here and now (maybe slightly cumbersome) unlike the current situation.

Implementing this feature looks like a good beginner PR, doesn't require large modifications of the code base and it solves color issues.

bkamins commented 3 years ago

@ronisbr should confirm for 100%, but what you ask for is already possible. Just pass appropriate arguments to

show(df, your_printing_options_for_colors)

what is hardcoded is the behavior of:

show(df)

and the related const https://github.com/JuliaData/DataFrames.jl/blob/2b9f6673547259bab9fb3bf3b5224eebc7b11ecd/src/abstractdataframe/prettytables.jl#L28 and recently we have been discussing if we can allow this default configuration to be changed.

But let us wait for @ronisbr to comment as he knows best this part of source code.

ronisbr commented 3 years ago

Hi @Arkoniak !

@bkamins is 100% correct. You can pretty much change any color by passing options to show. You can even change that "hard-coded" highlight inside DataFrames.jl but it is a little bit more complicated:

Captura de Tela 2021-08-05 às 09 39 12

For more information, see the docs of the function pretty_table. All those keyword can be passed to show(df, ...).

Arkoniak commented 3 years ago

But show is a function, which is being called internally, when DataFrame object is displayed, right?

So, currently the only way to make changes permanent is to pirate display function like

Base.display(df::AbstractDataFrame) = show(df, border_crayon = crayon"yellow bold", header_crayon = crayon"green bold", text_crayon = crayon"red")

Piracy is not very welcome around here, as far as I know...

But if it is official way, then it is fine by me :-)

bkamins commented 3 years ago

So, currently the only way to make changes permanent

So you want to change the default settings? I am asking, because in your original question you passed arguments with color to a function, so I thought you wanted a one-time change.

Piracy is not very welcome around here, as far as I know...

Indeed type piracy is not something that is recommended. However, for display you most likely can safely assume that there is no big risk of such change that you ask for (of course it needs to be done correctly). I assume you do not have any code that relies on the actual colors displayed.

Arkoniak commented 3 years ago

Yes, I meant default settings. Function in the snippet is an example of the internal library function which is doing actual printing at some time. It can be printstyled or any custom function that you use to print colored output. What I was trying to say is that people usually do printstyled("something to print", color = :red) and after that it is impossible to use any other color for that element, since it is hardcoded.

In case of DataFrames, you have

const _PRETTY_TABLES_HIGHLIGHTER = Highlighter(_pretty_tables_highlighter_func,
                                               Crayon(foreground = :dark_gray))

which can be changed to

const COLOR = Ref((; foreground = :dark_gray))

and in https://github.com/JuliaData/DataFrames.jl/blob/2b9f6673547259bab9fb3bf3b5224eebc7b11ecd/src/abstractdataframe/show.jl#L266

                 highlighters                = (Highlighter(_pretty_tables_highlighter_func,
                                               Crayon(foreground = Color[].foreground)),),

with something along these lines, you can provide customizable colors without the need for pirating display function.

Or maybe something similar should be done deeper in PrettyTables.jl, I do not know.

bkamins commented 3 years ago

with something along these lines, you can provide customizable colors without the need for pirating display function.

Yes - we are considering this, but what @ronisbr reports is that making this flexible hugely impacts time-to-first-plot.

ronisbr commented 3 years ago

@bkamins is right. In the very first attempt to integrate PrettyTables.jl with DataFrames.jl, I used a global state variable to handle those options. PrettyTables.jl has a system like this, but the time to print the first table was much higher. I heard that this was greatly enhanced in Julia 1.6. Maybe I need to revisit this.

@bkamins are you ok if I managed to create a global printing state inside of DataFrames.jl that can be changed by the user?

ronisbr commented 3 years ago

Using Julia 1.7, there is also a "huge" slowdown:

julia> using PrettyTables

julia> @time pretty_table(rand(2, 2))
┌──────────┬──────────┐
│   Col. 1 │   Col. 2 │
├──────────┼──────────┤
│ 0.956928 │ 0.269498 │
│ 0.585206 │ 0.524423 │
└──────────┴──────────┘
  0.775265 seconds (1.04 M allocations: 55.040 MiB, 1.08% gc time, 99.87% compilation time)
julia> using PrettyTables

julia> conf = set_pt_conf(border_crayon = crayon"yellow bold");

julia> @time pretty_table_with_conf(conf, rand(2, 2))
┌──────────┬──────────┐
│   Col. 1 │   Col. 2 │
├──────────┼──────────┤
│ 0.694764 │ 0.215185 │
│ 0.229285 │ 0.626746 │
└──────────┴──────────┘
  0.908953 seconds (1.17 M allocations: 63.057 MiB, 8.43% gc time, 99.89% compilation time)
bkamins commented 3 years ago

are you ok if I managed to create a global printing state inside of DataFrames.jl that can be changed by the user?

Let us wait a bit for @nalimilan. In general we avoid as much as possible mutable global state of DataFrames.jl, but maybe in this case we could go for it.

nalimilan commented 3 years ago

I'd say it would be OK to allow changing the default colors, that does seem a case where a mutable global state cannot cause severe issues. But would that increase the time needed to print the first table for all users, or only those who change the default colors?