Open mokasin opened 10 years ago
It should be somewhat faster now, but still quite slow. I know roughly what needs to be done to improve the speed here so I'll try to get to it soon.
I'll be glad, if you elaborate the issue a little bit. It would be instructing.
The calls to sort!
you saw were from unnecessarily storing something as PooledDataArray, rather than just a DataArray. That was easy to fix.
The slowness now is simply Gadfly (actually Compose) not being particularly fast at drawing very complex graphics. What you're plotting involves drawing 250000 or 1000000 rectangles, so it takes a while. I've not put much work into optimizing Gadfly, so that's something I need to improve in general.
That said, the example here is essentially coloring individual pixels. It will be always be pretty inefficient if rendered using the SVG or D3 backends. That makes me think there should be some sort of special handling for what's is essentially raster graphics.
That makes me think there should be some sort of special handling for what's is essentially raster graphics.
That sounds sensible. ggplot2 also defines a special geom for it: geom_raster.
At least D3 kind of supports this using the canvas element: http://bl.ocks.org/mbostock/3074470 http://bl.ocks.org/mbostock/3289530
Latest update: I've added the ability to rasterize parts of a SVG image and embed them as PNG. I still need to expose this in Gadfly. My first thought was to add an argument to rectbin, like Geom.rectbin(raster=true)
, but now I'm thinking this should just be an argument to plot, like plot(..., raster=true)
that will cause all the geometry to be rasterized. Sound reasonable?
Sounds reasonable to me. I'd also stick the flag to plot instead of a specific geometry. Maybe you want, for some arbitrary reasons, plot many many lines or points too. :thumbsup:
This sort of feature would be super useful for me as well. I used Winston for the figures in a project a few months ago, and I was hoping that I could switch to Gadfly.
In audio work it's extremely common to want to plot a spectrogram, which for the purposes of plotting is basically just a 2D matrix of floats:
It would be great to be able to do this sort of thing and generate beautiful Gadfly plots!
I've made changes to Compose and Gadfly to rasterize part of an SVG plot and embed it as an image. This solves the fontend slowness I think: you can have zoomable plots with these sorts of dense heat maps now by doing:
plot(df, x=:x, y=:y, color=:z, Geom.rectbin, Coord.cartesian(raster=true))
Just generating the plot is still super-slow though (@mokasin's example takes nearly 40 to render for me). I'll look into optimizing that.
I did some work optimizing Compose and Gadfly today. As it stands @mokasin's example takes ~4.1 seconds (on the second call, the first takes 27 seconds). The optimizations mostly aren't specific to rectbin so Gadfly should overall be significantly faster.
Now I'm running up against the fundamental inefficiency of using a vector graphics system to work at the pixel level. To match the performance of imshow
, it really needs to operate the same way: color a million pixels rather than draw a million rectangles. So I think the ultimate solution will be to implement direct support for bitmaps without going through Cairo. I'll leave this issue open until I get around to that.
Awesome, thanks for the work on this!
For imagesc / imshow (drawing a matrix as an image), is rectbin the right approach or spy, or something else?
They ultimately do the same thing, spy
is just shorthand to simplify plotting matrices and to be somewhat familiar to matlab users.
Just tried it with spy
in an IJulia notebook:
m = rand(100, 100);
@time spy(m, Coord.cartesian(raster=true))
-----------
elapsed time: 0.000245426 seconds (253744 bytes allocated)
ctx not defined
in drawpart at /Users/srussell/.julia/Compose/src/container.jl:343
in draw at /Users/srussell/.julia/Compose/src/container.jl:278
in writemime at /Users/srussell/.julia/Gadfly/src/Gadfly.jl:801
in sprint at iostream.jl:229
in display_dict at /Users/srussell/.julia/IJulia/src/execute_request.jl:31
Whoops, I hadn't checked out the latest master of Compose, so I was getting the same error with both spy
and @mokasin's approach. Working now after pulling Compose.
When using the geom rectbin for drawing grayscale like images or matrices the drawing operation does not scale very well.
Executing following code on my machine (i5 3570K, DDR3 1600) takes nearly half a minute:
Plotting an 1000x1000 matrix simply does take much too long. Routines like imshow from Python's Matplotlib need only a second to plot this.
When profiling the draw() function in the code above one can see many calls of sort! outgoing from the scale.jl.
Further investigations of what is taking so long seem necessary.