GiovineItalia / Gadfly.jl

Crafty statistical graphics for Julia.
http://gadflyjl.org/stable/
Other
1.9k stars 250 forks source link

How to plot two lines with different colors? #526

Open ViralBShah opened 9 years ago

ViralBShah commented 9 years ago

On the Gadfly website, there are several examples with dataframes, where different data series are colored differently.

I was writing a simple tutorial, where I wanted to give multiple x and y vectors, and have each of them plotted with a different color. I just couldn't find a simple way to do that. In Matlab, this is easily accomplished by giving multiple inputs such as plot(x, y1, x, y2).

If I understand correctly, it should be possible to do this easily with layers, but is it possible to do it as easily as it is in matlab?

Cc: @shashi

aviks commented 9 years ago

Gadfly is actually significantly easier to use with dataframes. In particular, it is by far easier to bind color to a column in the frame. Anything else is a bit awkward

However, if you did want to plot only with arrays, one way to do what you want is as follows:

plot(layer( x=[1:10], y=rand(10),Geom.point, Geom.line, Theme(default_color=color("orange")) ),
      layer( x=[1:10], y=rand(10),Geom.point, Geom.line, Theme(default_color=color("purple"))) )

This produces something like this:

screen shot 2015-01-05 at 23 59 52

ViralBShah commented 9 years ago

I wish there were a way to make that syntax a lot more compact, of course without special casing anything.

dcjones commented 9 years ago

It's true that things can get ugly if the data isn't a data frame, or at least tabular.

I was thinking about a syntax to make this thing easier a while ago: https://github.com/dcjones/Gadfly.jl/issues/89#issuecomment-29692630

I'm simultaneously impressed that I remember a comment I made a year ago and depressed that I never did anything about it. I hate to add special cases or alternative syntax (e.g. I think qplot in ggplot2 is a mistake), and generally prefer consistency to compactness, but this comes up pretty frequently, and my usual advice ("put your vectors in a data frame, then use melt to reshape it into form Gadfly expects") isn't very satisfying.

timholy commented 9 years ago

There are also performance considerations stemming from needing to force everything into a DataFrame (https://github.com/dcjones/Compose.jl/issues/105#issuecomment-67963024).

But, I agree 100% that this is not an easy question to answer well. It's really hard to support many different APIs simultaneously, and I too would be quite cautious about trying.

ViralBShah commented 9 years ago

I think there are lots of users who do not need to use DataFrames, but would love to use Gadfly. I also agree that I don't want special casing.

johansigfrids commented 9 years ago

Even with DataFrames I find myself wishing I could just pass multiple columns to the y aesthetic and save me a lot of stacking and melting. And not just for lines, most of Gadfly's Geoms could take advantage of it.

kzapfe commented 9 years ago

I also think the DataFrame thing is quite awkaward. I have a DataFrame in which the first column is the x value and the next 65 columns are y values. I cannot find a easy way to plot them all, just indexing the colors by column number. I have read both Gadfly's and DataFrames documentations in detail and there seems to be none.

lobingera commented 9 years ago

This DataFrame vs. other input to be organized as different lines in the same plot. There was some time ago on julia-users a discussion about generalizing plot-APIs. Maybe a "generalized input heuristic" (read as: some code that determines what can be plotted from the input material e.g. vector(y) -> x: enumerate elements, y: y; complex(c) -> x: real(c), y: imag(c); matrix m [n x 2] -> x: m[:,1], y: m[:,2] and similar) could be the starter. If there's more than one "set" available, plotting will be asked, to e.g. cycle colors or markers...

tbreloff commented 9 years ago

I'm developing a plotting interface with Gadfly as the first guinea pig (not counting Qwt, which is my package). This issue is old, but I think still very relevant... take a look (https://github.com/tbreloff/Plots.jl) and especially check out the examples for Gadfly:

https://github.com/tbreloff/Plots.jl/blob/master/docs/gadfly_examples.md

I'm eagerly awaiting peoples opinions on the API, and to gauge people's opinions on where I should prioritize my time.

Abhdez commented 7 years ago

use another column in your data, and use the color attribute. the plot will take and classify that column in to different colors. When you are manipulating data you ideally want a big table with variables as columns. In this examples, each X in your table will give you an Y value also in your table, an in a third column you would write to what function it corresponds, may be X^2, 2X, e^-x .. etc.. "color" would be the "legend". This is the easiest way, and it is a proper way to manipulate data. gl! plot(df, x=:Xvalues, y=:Yvalues, color=:Functions, Geom.line)

bjarthur commented 7 years ago

worth noting that in most places strings can be used as colorants as the are automatically sent to parse(Colorant,.... so Theme(default_color="red") should work. see https://github.com/GiovineItalia/Gadfly.jl/pull/998

miromarszal commented 4 years ago

I believe what I want to do is the same issue, but I can open a new one if necessary.

I often find myself fitting an analytical model to some data and plotting this data along with the model function. I don't mind storing the data in a DataFrame, but at the same time I want to avoid tabulating the fitted function. In Gadfly, I would plot it like this:

l1 = layer(df, x=:time, y=:vals, Geom.line)
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line)
plot(l1, l2)

This plots the two lines with the same color. To have them in different colors, I thought I can do the following:

l1 = layer(df, x=:time, y=:vals, Geom.line, color=["data"])
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line, color=["model"])
plot(l1, l2)

but this, instead of displaying the plot, prints (I'm using Jupyter):

Plot(...)

To my surprise, if I change the geometry in the first layer to points, it will plot everything just fine:

l1 = layer(df, x=:time, y=:vals, Geom.point, color=["data"])
l2 = layer(t->model(t, param), extrema(df[!,:time])..., Geom.line, color=["model"])
plot(l1, l2)

This could be a sort of a workaround, but often a line plot is the most natural way to show what we want, e.g. when data is dense and has some fine detail. Plotting 1e4 data points brings Jupyter nearly to a halt.

What is going on? Why doesn't it work with two Geom.lines and at the same time does work with Geom.points + Geom.line?

Mattriks commented 4 years ago

See #1459 , #1463 and #1465. This has been fixed on Gadfly master (]add Gadfly#master). More improvements like this are coming soon! Note with your above example (and in Jupyter) you can see which layer is causing the issue above by doing e.g. draw(PNG(), plot(l1)) and draw(PNG(), plot(l2)) .

miromarszal commented 4 years ago

That indeed works on master, great!

Mattriks commented 4 years ago

Also please look at #1430, and add any changes there about color syntax that you would like to see!