Open davidanthoff opened 10 years ago
You can get the effect you want by using 1900 and 2400 in x_continuous
, like:
plot(x=collect(1899:2401), y=collect(1899:2401), Geom.line,
Scale.x_continuous(minvalue=1900, maxvalue=2400),
Scale.y_continuous(minvalue=1900, maxvalue=2400))
I agree that the automatic range and tick marks aren't always great. Here it doesn't work well because it has to choose values that span the data and (1900, 1400) doesn't span (1899, 2401). So slightly relaxing that constraint when choosing the range might be a good idea. I'll experiment with this.
I had a look at optimize_ticks. That does not look like the original Wilkinson scoring, right? I now replaced that with a translation of an R package that implements the original Wilkinson scoring method and at least this example graph looks better. I'll clean up a bit and then you might have a look.
It's Wilkinson's method as described in his book, with a couple tweaks. The R package may have a better implementation. I'd be happy to use that if so.
I've spent some more time and I think to get this right requires more changes. Here are my current thoughts:
1) right now the coord adds padding around the range of the scale that makes a lot of graphs look weird and essentially makes it e.g. impossible to have the first tick align with the crossing axis. There is a comment in the code that says this is a kludge but needed to fit bar graphs on a discrete scale. I looked at the book again, and it seems to me the right approach on that one is to solve the discrete case problem in the discrete scale construction (this is described on p. 94 of the book) and then introduce no padding in the coord. 2) I think for a continuous scale if one passes in a min and max that should be a hard constraint, i.e. the algorithm shouldn't try to find ticks outside that range and then have the scale cover the ticks that bracket the min/max combo.
My sense is that with those two changes things would already look a bit better and if one wanted one could control things better. I'm happy to have a go at this, but right now there is one thing I don't understand: why is the optimize_ticks function called in the statistics? I guess I just don't understand the structure of the code well enough at this point, but I somehow had assumed that all of that stuff should be in the scale code.
Thanks for taking the time to investigate this.
right now the coord adds padding around the range of the scale that makes a lot of graphs look weird
There's two separate kludges here: for discrete scales padding is added to prevent bar plots and boxplots from drawing outside the plot canvas. That should be done in a better way, like you say.
For continuous scales, padding is added so that the labels for the first tick on the x and y-axis aren't crowded together (that's that 0.03 * (xmax - xmin)
term). I don't think that padding is especially weird, but could be handled better.
I think for a continuous scale if one passes in a min and max that should be a hard constraint
I agree. That would be easier to interpret than how it works now.
why is the optimize_ticks function called in the statistics?
Wilkinson includes choosing ticks as part of the scale. I felt like that muddied the concept a little, so tried to structure it differently. Tick generation is a statistics, since like other statistics it's a function that computes aesthetics from some other aesthetics. In this case computing xtick
, ytick
, etc, from x
, y
, etc. That's not definitely the right thing to do, but has some advantages.
Any idea why this could be happening with my y-axis tick labels?
The plot is constructed using this code
pl = plot(plot_data,
layer(x = "s", y= "int", Geom.point, Theme(default_color = color_dot)),
layer(x = "s", y = "int_curve", Geom.smooth(), Theme(default_color = color_line)),
Scale.y_continuous(minvalue = y_min, maxvalue = y_max),
Scale.x_continuous(minvalue = 0, maxvalue = 36),
Guide.xticks(ticks = [0:6] * 6),
plot_theme,
Guide.Title("Firm $firm_id; $date_string"),
Guide.ylabel(y_lab, orientation = :vertical),
Guide.xlabel("Prediction horizon (months)")
);
Thanks,
Felix
I'm not sure. I've not managed to reproduce this yet. What is y_min
, y_max
set to here?
I create a series of plots which are then displayed in an animation using the animate package in LaTeX. To force all plots to have the same y-axis, I determine the minimum and maximum y-values across all plots and set y_min
and y_max
accordingly for each plot (the plots are created in a loop).
UPDATE: I've checked the values for another example. The actual values are y_min = 1.539317798151983e-6
and y_max = 0.03169651300783998
. Maybe I should round these to the second decimal? Looks to me like your automatic tic computation might be in trouble with these values?
UPDATE: Flooring/ceiling to the second decimal fixed the issue in the above example. Rounding to the fourth decimal did not and resulted in the same problem with tick labels.
Unfortunately, my attempts to fix this for all my plots have failed so far. Not having looked at the Gadfly code, yet, my understanding of the problem is the following:
minvalue
and maxvalue
parameters to Scale.y_continuous()
minvalue = 0.01
and maxvalue = 0.41
your algorithm will determine the appropriate ticks as [0:0.1:0.5]
. The highest tick value, however, is larger than my maxvalue
. Gadfly will ignore this and simply print the labels for all tick values. It should however remove any tick values larger than my maxvalue
parameter.I'll have a look at the code now.
Man I wish I didn't find it so hard to understand the Gadfly codebase. It's so opaque to me :( I'm never quite sure which code is responsible for what and there are so many calls to functions that provide other functions in some anonymous way :(
Here are some questions:
if scale.minvalue != nothing
if scale.vars === x_vars
aes.xviewmin = scale.trans.f(scale.minvalue)
elseif scale.vars === y_vars
aes.yviewmin = scale.trans.f(scale.minvalue)
end
end
if scale.maxvalue != nothing
if scale.vars === x_vars
aes.xviewmax = scale.trans.f(scale.maxvalue)
elseif scale.vars === y_vars
aes.yviewmax = scale.trans.f(scale.maxvalue)
end
end
in scale.jl
assigns the minimum and maximum values used when determining how much of the plot is visible (i.e. the panel)? However, somehow some margins still need to be applied to these values.
aes.yviewmin
and aes.yviewmax
properties used to determine the aes.ytickvisible
property in guide.jl
? Where does the "aesyticksvisible" get assigned?aes.ytickvisible
property (containing all visible ticks?) responsible for which tick labels are drawn when using any of the static backends? I realise that you actually compute more ticks than those that are visible in the static backends. This allows you to zoom in/out in the JS backend.Maybe you can point me in the right direction. I wish I had a better understanding of the code base. Sorry, I can't be of more help.
Thanks for trying to debug this. Obviously I haven't documented the codebase at a high level, so it can be pretty intimidating to wade into.
I still haven't figured this out, and can't provoke it into reproducing this. (Is there any way you could post an example with data that causes this?) I can try to point you in the right direction:
xviewmin/max
aesthetics which are in turn set from the minvalue
maxvalue
arguments passed to the scale.xviewmin
xviewmax
and finds a set of ticks than encompass that range with nice numbers.While trying to explain the code to you, I potentially found the problem. Could you checkout master and see if there's any difference?
Sorry, I was on vacation and am now busy with everything but research. I'll take a look soon. Thanks a lot!
Hi, unfortunately this did not fix the problem. I'll try to give you a minimal example soon. Thanks for the effort!
I've just found this issue, not sure why I didn't see it before posting to stackoverflow. I've written some code to generate what seem to be better axis ticks:
http://stackoverflow.com/questions/28943866/r-style-axis-ticks-with-gadfly-jl
Not sure if this is helpful, but providing some way of changing the tick method may be nice.
If I plot the following
I get a graph that has a range from 1800 to 2600 on the x axis and that essentially looks pretty ugly because that range is too large. A much prettier plot would have a range that only spans the actual data (so 1899:2401) and then it might have ticks at say 1900 to 2400 with step size 100 or so.
So, one suggestion and question: 1) it would be great if this could look better automatically and 2) is there a way to force the x min and max values on the plot? I tried Scale.x_continuous with minvalue and maxvalue, but if I set those to the actual range of the data it again displays this on an axis going from 1800 to 2600.