JuliaPlots / Plots.jl

Powerful convenience for Julia visualizations and data analysis
https://docs.juliaplots.org
Other
1.84k stars 355 forks source link

Show every value along axis #1018

Closed holocronweaver closed 5 years ago

holocronweaver commented 7 years ago

I am attempting to create a bar graph similar to the first plot shown here using Gadfly.

The problem is Plots does not show most of the horizontal axis data labels:

bar(counts[:Class], counts[:x1], legend=:none, size=(1000, 500), xlabel="Characters", ylabel="# of samples", title="Distribution of training data")

image

How can I show all the labels along the horizontal axis, rather than every 5? I am guessing there is a way to specify distance between ticks?

mkborregaard commented 7 years ago

Specify xticks with a tuple of (positions, labels). If having trouble post a link to the DataFrame.

holocronweaver commented 7 years ago

The DataFrame is from trainLabels.csv in this Kaggle project. Basically, the :Class column is single characters (some of which are shown in the above graph). I am passing that column as x in bar.

Thus the x data is characters. I want to show all of the data on the x-axis, not every five elements.

Is there a way to tell Plots I want to display every x-value, not just a subset, without passing a redundant tuple? Maybe something like, xticks_spacing = 1 instead of the apparent xticks_spacing = 5 which Plots is defaulting to in the above plot.

mkborregaard commented 7 years ago

If you have a good idea for improving the interface you're very welcome to open a PR. What is x1?

holocronweaver commented 7 years ago

x1 is a frequency count for each character in Class. So x1 is a column of plain integers. The bar graph itself is meant to show the relative frequency of characters.

I will take a look at Plots source and see how easy it is to add.

mkborregaard commented 7 years ago

In this case the call is just bar(counts[:x1], xticks = (1:nrow(counts), counts[:Class])). (there is a bug that currently prevents it from being the more readable bar(counts, :x1, xticks = (1:nrow(counts), :Class)))

The functionality was added here: https://github.com/JuliaPlots/Plots.jl/pull/793 as a response to https://github.com/JuliaPlots/Plots.jl/issues/792

holocronweaver commented 7 years ago

Thanks! Works as intended.

That API is not very intuitive. Would it be possible to mimic Gadfly, which simply 'does the right thing' when given a horizontal axis of strings? Perhaps this could be done by checking for a string type in x and, if found, implicitly convert it to xticks = (1:nrow(counts), counts[:Class])) or an equivalent.

Considering how common frequency plots are, this seems a case worth making as effortless as possible.

Edit: To be clear, I mean doing something like bar(counts[:Class], counts[:x1]) and getting the same result as what you posted above.

mkborregaard commented 7 years ago

The current behaviour is intentional - have a look at the issue and PR I posted.

holocronweaver commented 7 years ago

Ok, read the whole thread and PR.

I generally agree with your rationale, but I think things can be even better.

Most bar graphs I encounter require showing all string labels. They often aren't useful otherwise, so that is the default behavior I expect. This differs from how I think about numeric axis where hiding by default usually makes sense.

However, I agree that if the graph will be completely illegible due to large data length / graph length ratio, data hiding should occur unless the user specifies otherwise (perhaps via xticks=:all or similar). So I guess the solution is to make a smarter check for illegibility so that it is only done when it's really needed (in may case, it was not).

Either that or make the default behavior for string data be to show all values and require the user to manually specify data hiding (maybe via xticks=:sparse).

Either way, I don't think manually specifying ticks is an intuitive or convenient API, and frequency plots are common enough it is worth coming up with a better solution to push towards effortlessness.

I'll get off my soapbox now. X-D

mkborregaard commented 7 years ago

Maybe allowing xticks = :all for cases like this where the x labels are given by the x axis anyway?

holocronweaver commented 7 years ago

Yep. That works best if string data labels are shown in their entirety by default (i.e., xticks = :all by default for x = strings), and hiding only occurs when a check verifies that tick label overcrowding will likely occur.

Edit: I am willing to implement this. For deciding whether to hide labels I'd probably do a small random sample of strings to get a max length estimate, then use that to estimate the minimum space between labels based on font size and plot dimensions. If the space is narrower than some minimum margin, label hiding occurs. Sound reasonable?

mkborregaard commented 7 years ago

So, allowing to override xticks with something simpler than a tuple should be easy, it should be a small fix in src/axes.jl/get_ticks() . It could either be to just pass the Array{String} to xticks and have that work (I questioned whether you could do that here https://github.com/JuliaPlots/Plots.jl/pull/853 but I guess it should be possible to just use the x values as positions if they are of the same length). If instead you'd want something like passing :all it would require some update to args.jl as well.

With regard to your other idea of estimating the actual amount of space necessary, then that would certainly be nicer than just checking for the number 30 as we do now here (https://github.com/JuliaPlots/Plots.jl/blob/master/src/axes.jl#L240-L247) - as you can see, it's a longstanding TODO. That would require tickfont, the length of the strings, and orientationand some clever codeing, but would def be nice!

daschw commented 5 years ago

This already is possible with xticks = :all

using Plots

x = string.('a':'z')
y = Int.('a':'z')

plot(x, y, xticks = :all)

allticks