GiovineItalia / Gadfly.jl

Crafty statistical graphics for Julia.
http://gadflyjl.org/stable/
Other
1.9k stars 250 forks source link

Performance #555

Open dcjones opened 9 years ago

dcjones commented 9 years ago

I've made some significant gains in performance this weekend. Tests now run in 180 seconds vs 250 seconds before, and a lot of that is compilation time. There are still a lot of improvements that can be made, but I think the best way forward is to focus on individual types of plots.

Please use this issue to post examples that you find unreasonably slow, and I'll see if anything can be done. Try master on Compose and Gadfly first, as it may already be faster.

If your example involves drawing a huge amount of geometry (e.g. trying to color individual pixels with a rectbin plot), cairo or your browser will probably be the bottleneck. In that case, not a lot can be done apart from disabling antialiasing and simplifying the geometry. We should be able to make other cases fast.

timholy commented 9 years ago

Very nice to hear! I'm linking https://github.com/dcjones/Compose.jl/issues/105#issuecomment-67963024 which describes what's probably my biggest overarching concern. Don't know whether it's still relevant, though!

dcjones commented 9 years ago

By the way, the simplest example:

draw(PNG("bench.png", 6inch, 6inch), plot(x=rand(1000000), y=rand(1000000), Geom.point))

This is about 30 seconds on julia-0.4 for me now. It really is bottlenecked by cairo at this point. Part of the reason is that Gadfly draws outlines around the circles in Geom.point (i.e. they are stroked and filled in different colors). It turns out stroking circles is pretty expensive. If that's disabled:

draw(PNG("bench.png", 6inch, 6inch),
     plot(x=x, y=y, Geom.point,
     Theme(discrete_highlight_color=c -> nothing)))

It draws in 13 seconds. Then if we further disable antialiasing (by manually sticking Cairo.set_antialias(ctx, 1) the cairo backend), we get down to 7.8 seconds. Still slower that I'd like, but not so bad.

dcjones commented 9 years ago

It's true that Gadfly assumes the data is flat columns, and having to flatten the data (e.g. concatenate matrix columns) can slow things down and eat a lot of memory. I think there are two issues, one is a usability issue: you're forced to manually concatenate the columns and create a big equal-sized array to group entries by color, the other problem is the performance hit you incur doing this.

I think that can be solved though. Since Gadfly only needs to be able to treat the data as if it were a flat array, we should be able to wrap the matrix in a type that let's us index into it linearly. And further, indicate colors with run-length encoded array. There's probably additional work in Gadfly to avoid unnecessary copying, but I don't see this as a hopeless design limitation.

jiahao commented 9 years ago

If your example involves drawing a huge amount of geometry... not a lot can be done apart from disabling antialiasing and simplifying the geometry.

Note that the "simplifying the geometry" step is an integral component of Bokeh; they call it abstract rendering. The papers linked there by Peter Wang and coworkers have lots of good ideas worth co-opting.

Over the process of modernizing the examples (dcjones/Compose.jl#108), I discovered the existence of deferred canvases/contexts. It looks like the lower level transformations involving overplotting, α-blending and pruning steps that Bokeh does can be inserted quite naturally as a postprocessing step of rendering a a deferred context, if it's not already done programmatically by transforming the input context tree. Higher-level transformations, like creating heat maps and contouring, are more suited toward Gadfly. Or maybe all of these transformations really belong in their own package.

timholy commented 9 years ago

I think that can be solved though...

Ah, that's really good to hear---much more optimistic than I frankly expected. For the flattening, the up-and-coming ReshapeArray will do nicely.

timholy commented 9 years ago

I can confirm that Cairo is a huge bottleneck. In 2015, plotting a million points should be, like, 0.1s or so. (In this regard, Matlab is blowing us out of the water.)

I am more than a little serious in thinking about a ComposeGL.jl based on @sdanisch's work--- The main obstacle is that my laptop doesn't support a recent-enough version of OpenGL. (Time for a new laptop, I think.) I'm not going to have time to work on this very soon, but longer term (summer?) it's a pretty major priority.

SimonDanisch commented 9 years ago

Main obstacle for now is to get anti-aliased line drawing running. If you're interested in contributing, there is quite some work involved, that doesn't need OpenGL. GLDraw is my take on this. It implements the algorithms from MapBox. This is the source I'm trying to translate to Julia: https://github.com/mapbox/mapbox-gl-native/tree/master/src/mbgl/ And this blog article is describing their approach: https://www.mapbox.com/blog/drawing-antialiased-lines/ Linebucket.cpp should be the first to be implement, giving us anti-aliased lines. I started porting it at Linebucket.jl. There is also the possibility to port their code in an more abstract way, which is a lot simpler, as their code is highly optimized. I started with this at line.jl. This is not very optimal, as I do most of the work in the shader for every frame.

This still leaves the question open, on how to get from Gadflys high level description, to a tightly packed array of points and drawing instructions, which is what we need for OpenGL in the end.

SimonDanisch commented 9 years ago

By the way, I think the pay off is pretty big for this. I'm not that well informed, but I would think that it would put us miles away from most other plotting packages, as MapBox is incredible fast, even when compared to highly optimized domain specific solutions. So it's faster than most OpenGL solutions, and way faster than any CPU based solution. (Haven't researched this thoroughly though)

dcjones commented 9 years ago

Here's a really quick and dirty experiment with simplifying geometry before rendering. I used a deferred context to round every point to the nearest 1mm then removed identical points before rendering.

In this benchmark:

using Gadfly, Distributions

n = 1000000
srand(1234)
xs = rand(Normal(), n)
ys = rand(Normal(), n)

draw(PNG("try.png", 6inch, 6inch), plot(x=xs, y=ys, Geom.point))
@time draw(PNG("try.png", 6inch, 6inch), plot(x=xs, y=ys, Geom.point))

Gadfly master was 33.08 seconds. This simplification trick reduced that to 4.09 seconds and only 6682 of the 1million points are actually drawn. The plots looks similar, though the rounding is apparent. I don't think it would be, if it weren't for the outline around points.

Master try-before

Simplified try

This seems like a promising approach, even if a smaller epsilon were used. Doing it right will take a little more work though. The way I implemented this only works for floats. I think it really needs to be implemented in Compose to be general purpose.

yeesian commented 9 years ago

Cool!

May I ask: what does it look like with the alpha values turned lower? Although I guess you should be doing contours/hexbins/heatmaps if you're interested in density instead.

I'd been doing a port of Turf.js as well, and just wanted to point out that they have a function for simplifying lines/polygons as well.

lobingera commented 9 years ago

I did a comparison of drawing a (lot) of points via different methods in the Cairo/test/test_speed.jl. In general, Cairo's speed is dependent on what you ask to draw and how that is rendered.

cairo version: 11301 Surface Size: 512 Paint Width: 3.0 ddots1 100 elapsed time: 0.012884 seconds ddots1 1000 elapsed time: 0.006679 seconds ddots1 10000 elapsed time: 0.072065 seconds ddots1 100000 elapsed time: 0.910546 seconds ddots1 300000 elapsed time: 3.450002 seconds ddots1 1000000 elapsed time: 32.875774 seconds

ddots2 100 elapsed time: 0.009423 seconds ddots2 1000 elapsed time: 0.006233 seconds ddots2 10000 elapsed time: 0.060443 seconds ddots2 100000 elapsed time: 0.618212 seconds ddots2 300000 elapsed time: 1.803035 seconds ddots2 1000000 elapsed time: 6.014527 seconds

ddots3 100 elapsed time: 0.008723 seconds ddots3 1000 elapsed time: 0.007290 seconds ddots3 10000 elapsed time: 0.074108 seconds ddots3 100000 elapsed time: 0.749144 seconds ddots3 300000 elapsed time: 2.186088 seconds ddots3 1000000 elapsed time: 7.577688 seconds

ddots4 100 elapsed time: 0.035139 seconds ddots4 1000 elapsed time: 0.205216 seconds ddots4 10000 elapsed time: 2.117925 seconds ddots4 100000 elapsed time: 21.272077 seconds ddots4 300000 elapsed time: 62.823363 seconds ddots4 1000000 elapsed time: 206.063235 seconds

ddots5 100 elapsed time: 0.011996 seconds ddots5 1000 elapsed time: 0.002694 seconds ddots5 10000 elapsed time: 0.026432 seconds ddots5 100000 elapsed time: 0.261412 seconds ddots5 300000 elapsed time: 0.807928 seconds ddots5 1000000 elapsed time: 2.642710 seconds

So 1e6 dots (discs without ring) are drawn and rendered on my box by rendering 1 dot into a CairoPattern and painting (copying) this to the surface within roughly 2.6 s.

The fastest method usually propsed for cairo: set line ending to round cap and draw a 0-length line is roughly 6s, so 2.5x slower.

And this is almost the slowest possible rendering on cairo, as it's drawing into a ImageSurface and use pixmap at lot. Drawing to the local display system (X11-> XCB) should be faster than that, because rendering will be delegated and maybe hardware supported.

What i wanted to say is: There should be tailored methods to paint large content. Maybe an own bitmap-renderer (written in julia) should chime in clouds of dots > 1e5?

SimonDanisch commented 9 years ago

We already have this, at least as a prototype ;) dcjones/Skia.jl#1 The approach I'm showing there will also work for bitmaps instead of shapes, which is what I'm using for text rendering. It can be extended to use distance fields, which will give us camera independant anti aliasing. Ref: https://github.com/JuliaGL/GLVisualize.jl

Best, Simon

lobingera commented 9 years ago

Interesting. I also looked at Skia some time ago, but still Cairo has the largest lists (by far) of backends and the drawing model (masking) is still the most versatile around (and a C-API). But you pay for that by speed. There are now more and more canvases around, that backend EGL/OpenGL only. Which means you can have fast to very fast rendering on screen, but support for output in vector format (.svg/.pdf) is limited. A hybrid approach: Render via EGL on screen to have fast reaction, paint cairo in the background for a quality picture and replace the window after some time, could be doable.

lobingera commented 9 years ago

Just a side note: I wrote a simple (very simple) filled-circle-to-bitmap (technically a Array{Uint32,2}) renderer and got:

julia> for n in [100,1000,10000,100000,1000000,10000000]
        @time Lpaint(x[1:n],y[1:n],m,5)
       end
elapsed time: 6.8255e-5 seconds (4552 bytes allocated)
elapsed time: 0.000711024 seconds (40408 bytes allocated)
elapsed time: 0.002672019 seconds (400328 bytes allocated)
elapsed time: 0.022933334 seconds (4000328 bytes allocated)
elapsed time: 0.229228193 seconds (40000328 bytes allocated)
elapsed time: 2.36043287 seconds (400000328 bytes allocated, 3.03% gc time)

So with 1e6 dots (discs without ring) ~220ms. But this is missing anti-aliasing, has int coordinates+radius and a lot of other deficiencies (i e.g. limit the coordinates in doing the random positions).

(btw: imagesc via Winston a5

SimonDanisch commented 9 years ago

Do you have the code, so I can try this benchmark on my pc? would be interesting, how far away it is from the OpenGL version. One has to note though, the OpenGL version has FloatingPoint coordinates, does a full perspective transformation and anti-aliasing... Rendering the ring should be nearly as fast.

dcjones commented 9 years ago

Another idea that occurred to me is for cases in that involve drawing the same shape many time is drawing it once to a separate buffer then bliting it into place repeatedly.

function draw_circles(filename, width, height, circle_radius)
    # draw a fancy circle
    pad = 2
    circle_size = 2*circle_radius + pad
    circle_surface = CairoImageSurface(circle_size, circle_size, Cairo.FORMAT_ARGB32)
    ctx = CairoContext(circle_surface)
    set_antialias(ctx, Cairo.ANTIALIAS_BEST)

    circle(ctx, circle_size/2, circle_size/2, circle_radius)
    set_source_rgb(ctx, 1.0, 0.38, 0.28)
    fill_preserve(ctx)
    set_source_rgb(ctx, 0.8, 0.36, 0.36)
    set_line_width(ctx, 1.0)
    stroke(ctx)

    # copy the circle a bunch of times
    surface = CairoImageSurface(width, height, Cairo.FORMAT_ARGB32)
    ctx = CairoContext(surface)
    set_antialias(ctx, Cairo.ANTIALIAS_NONE)

    set_source_rgb(ctx, 0.9, 0.9, 0.9)
    rectangle(ctx, 0, 0, width, height)
    fill(ctx)

    for _ in 1:1000000
        x = rand() * width - circle_size/2
        y = rand() * height - circle_size/2
        set_source_surface(ctx, circle_surface, x, y)
        rectangle(ctx, x, y, circle_size, circle_size)
        fill(ctx)
    end

    write_to_png(surface, filename)
end

@time draw_circles("circles.png", 500, 500, 4)
   3.075 seconds      (13036 allocations: 532 KB)

circles

So that't about 10x better than the naive method with Cairo.

lobingera commented 9 years ago

@SimonDanisch https://gist.github.com/4fd1532b4eca3158fd9e

lobingera commented 9 years ago

@dcjones

Actually that the rdots5 entry in Cairo/test/shape_functions.jl ... so it's inline with my measurements.

SimonDanisch commented 9 years ago

Okay so it's 373.0ms, with @inbounds 287 vs 125ms in OpenGL... Not too bad ;) But as I said, the OpenGL version does quite a bit more. If you could rewrite the while into a fixed length for loop, maybe SIMD will jump in and it will be even faster.

For more complex shapes, @dcjones last example is the way to go I think. Its pretty much what I do with OpenGL and a texture atlas, into which I can render shapes with FreeType and Cairo, and then I distribute them via a particle system.

dcjones commented 9 years ago

@lobingera Ah, I should have looked at that first! That sort of thing wouldn't be hard to implement in Compose, since the whole graphic is defined before rendering. Though it does have the pretty severe limitation of only working if each shape is identical.

SimonDanisch commented 9 years ago

With a texture atlas, you can have ids for each shape, which enables you to render millions of shapes. This is efficient if you have a smaller set of shapes than particles to render... Like glyphs for example ;)

lobingera commented 9 years ago

@dcjones

Well, you could sort them. And another thing that i didn't try: Paint to a mask (CAIRO_FORMAT_A8) and apply that to solid color. Should be fast, too.

lobingera commented 9 years ago

@SimonDanisch But you render OpenGL via dedicated HW?

SimonDanisch commented 9 years ago

You mean like the GPU? Yes sure, thats what OpenGL is about ;)