Draw canvas primitives using OpenGL?

windexlight commented 4 years ago

Looking at the gl driver code for drawing primitives, it appears everything is rasterized in software, and then OpenGL is only used to draw the resulting rasters to the screen. Since there is an OpenGL dependency anyway, why not use OpenGL to draw the primitives? It must have been considered. Is it something planned for the future? Are there strong reasons not to?

andydotxyz commented 4 years ago

I don't think there is any particular reason not to use APIs that are available. However we do need to support back to OpenGL 2.0 and GLES 2.0 so the options may be limited.

Do you have experience in this area that you could use to help us perhaps?

windexlight commented 4 years ago

I don’t have experience with OpenGL, but this is something I might look into as I have time. My very high level assumption is that vector objects would probably have to be tessellated to be rasterized by OpenGL (in other words, there’s no easy “draw circle” API or the like), but seems like it should be possible. You’re right, it’s probably a minefield of compatibility.

Bluebugs commented 4 years ago

A possible strategy is to actually generate a span line texture on the CPU and use that in a shader to fill shape (Doing tesselation on the GPU is not necessarily the most efficient and put a lot more constraint on the shader that would do the tesselation which would impact portability).

A Span line texture would have just a serie of tuple with alpha value, length of run. The shader doing the filling of the shape, would just have to use this information to do the rendering by looping over this series. To make it slightly more efficient on the GPU, the texture would only be RLE encoded horizontally, while the vertical could be easily jumped over.

Technically the same technic could be used on the CPU side to do the rendering of the vector as it consume less memory bandwidth and result in faster result. The main constraint of vector graphics is still the memory bandwidth as it is very hard to generically reduce the over rendered area.

windexlight commented 4 years ago

Should this technique of generating the span line texture on the CPU be faster than the current technique of rasterizing the whole vector object on CPU? It sounds like an operation that’s similar in scope.

Bluebugs commented 4 years ago

There is two step done on the CPU, first computing the shape, then using it multiply with color and blend (in one pass), finally it is uploaded to the GPU where additional blending is done.

The GPU is good at parallel tasking which is easy on rendering (color multiplying + blending) to do efficiently without to much duplicated computation. It is harder to do efficient computation of the tesselation stage as a lot of duplicated computation will be required with a complex shader which would be costly to run by itself.

Uploading a compressed texture consume also much less memory bandwidth during assembling the UI. Their might be a point when there is a lot of shape being assemble in a small image that might be better rendered completely on the CPU and then uploaded on the GPU. There is a lot of parameter that are wiggle around when doing 2D rendering for a UI, memory bandwidth, cpu, gpu and energy consumption. At the end, compromise have to be done for the main use of the library.

If you move all the computation on the GPU, this will increase energy consumption, gpu requirements (hard for mobile and low end devices). If we keep everything on the CPU, it is also hard on the lower end as this consume much of the capability available for the application. Also it increase memory bandwidth use that might lead to the CPU being waiting for it and even further slowing down application. The hybrid solution that I propose should help balance things in fyne context.

windexlight commented 4 years ago

Do you expect that a 2D graphics library (preferably written in Go) should have APIs to generate such scan line textures, or that this would need to be implemented from scratch? If I understand correctly, the only savings really is in transferring a lower volume of data to GPU, and possibly in the CPU calculating an alpha channel only versus all channels.

Bluebugs commented 4 years ago

Freetype does work by giving scanline line by line, but not in a compressed RLE way, if I remember correctly. We got some recent improvement on their logic to actually serve this use case better after discussing with them. The code in Freetype that is interesting is : https://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/raster/ftraster.c . It can be used standalone. Pretty fast and beat Cairo for reference. I have not tried vs Skia, but should be at worst on par. I have not done any benchmark against other language native library. Ideally it would be a drop in replacement for rasterx code we currently use (Or patch could be contributed upstream) and an improvement on the freetype go port (maybe making one depends on the other would be fine and improve everyone life potentially). Still this might not be ideal as an additional improvement to this design is that you could actually run each texture generator in its own go routine (As they are CPU bound and not memory bound, they are a very good candidate for parallelisation) which would play nicely with a vulkan backend as we could have parallel texture upload with such a backend. This also introduce the concept of multiple blitter backend (software, opengl and vulkan).

Lowering the amount of data manipulated also on the CPU, help lower the general memory bandwidth used by rendering on the CPU, which is what limit today speed in application. The GPU should be good at maintaining cache locality as it usually work by tile, which software rasterizer usually don't. So doing the color multiplication and blending on the GPU should be generally more efficient than doing it on the CPU.

I am sorry I currently have no time to tackle all this ideas and I fill like I am dumping a lot of requirements, but this is not my first iteration at working on fast vector rendering logic. Best would be to do things step by step. Integrating ftraster, then compressing its output and still rendering in software, for finally moving it to the GPU.

windexlight commented 4 years ago

Looks pretty interesting. I can’t say I have a lot of time myself, but will start looking at it as I can. Is it your thought that straster should be built standalone and used via CGo, or that it should be ported to and adapted for Go? Perhaps as an update to the existing Go freetype port?

Bluebugs commented 4 years ago

It is a good question. Updating the existing go freetype port would potentially reduce long term maintenance and not increase our dependencies. That's why I do lean toward that solution, but without trying I will not have a strong opinion on the topic.

windexlight commented 4 years ago

A couple of questions for clarity. I spent a little time building examples to run ftraster, and it seems ftraster currently is a monochrome-only rasterizer (it may once have included an AA mode, but that seems to have been removed). I found there is another rasterizer implemented in src/smooth/ftgrays.c, which does do anti-aliasing. I would think this is a better starting point, as we want final results to be anti-aliased, and you mention span line textures using alpha values, implying that the image is anti-aliased.

You also mention having benchmarked against Cairo, and having pushed some changes to FT logic recently. Was this specific to ftraster, or could have been to ftgrays?

I’m also interested if you can detail any high level reasons why the existing rasterizer in the FT Go port is inferior to what’s in FT now, and why it might be unsuitable as a starting point?

Bluebugs commented 4 years ago

Indeed you are right about where the interesting rasterizer is. Sorry, I should have looked at the code more closely (I haven't in more than a year).

The change were not specific to ftraster, for the full discussion you can read this thread: https://lists.defectivebydesign.org/archive/html/freetype-devel/2019-05/msg00122.html . Before those change, Cairo was already slower. This are just more improvements on top of freetype rasterizer.

I do not know from looking at the FT Go port when the code from Freetype was taken and how much improvement from upstream is missing (Copyright state 2010, but there has been a continuous stream of patch on the FT Go code since then). It needs investigating and also I do not think that the rasterizer has been planed as an interface for freetype in FT Go. So to make it simple, I do not have an opinion on the exact state of FT Go as a starting point. I just know that upstream FT was better with improvements.

windexlight commented 4 years ago

Thanks very much for the additional info. One more thing I forgot to ask. When you mention RLE encoding being in the horizontal only so rows could be easily jumped over on the GPU, is that to say that rows should be randomly accessible? Meaning the length of data in each row must be constant, so rows that are more compressed than others must be padded to the length of the least compressed row? Or that a length value should be stored at the start of each row so that the next row’s data can be jumped to easily? Is this meant for parallel processing of rows in the shader?

Bluebugs commented 4 years ago

Indeed this is for parallel processing of rows in the shader. I do not have a strong opinion on which solution would be best. I have just played with the padding solution so far which seemed to me more "natural", but not really a good technical justification :-) So my answer would be, whatever work for you on this.

windexlight commented 4 years ago

Curious if anyone has looked at Pathfinder (https://github.com/servo/pathfinder)? I took a look at the demo, and the performance seems impressive. It requires OpenGL 3.0 or OpenGL ES 3.0 (or Metal), and I know you’re targeting 2.0. It doesn’t seem unreasonable to me to consider having multiple backend options, where newer hardware gets better performance, but there’s a fallback to CPU rendering when necessary. OpenGL 3.0 also isn’t exactly bleeding edge. Pathfinder is written in Rust, but has C bindings, so creating a CGo wrapper seems possible.

yxxyun commented 4 years ago

Curious if anyone has looked at Pathfinder (https://github.com/servo/pathfinder)? I took a look at the demo, and the performance seems impressive. It requires OpenGL 3.0 or OpenGL ES 3.0 (or Metal), and I know you’re targeting 2.0. It doesn’t seem unreasonable to me to consider having multiple backend options, where newer hardware gets better performance, but there’s a fallback to CPU rendering when necessary. OpenGL 3.0 also isn’t exactly bleeding edge. Pathfinder is written in Rust, but has C bindings, so creating a CGo wrapper seems possible.

Gioui https://gioui.org/ seems includes an efficient vector renderer based on the Pathfinder project (https://github.com/servo/pathfinder).

Bluebugs commented 4 years ago

Gioui https://gioui.org/ seems includes an efficient vector renderer based on the Pathfinder project (https://github.com/servo/pathfinder).

This is pretty interesting and require more digging, but they do have portability on WebAssembly which imply something very close to OGL ES2. Would be very interesting to understand what they did in porting pathfinder. This would be a possible solution. I had previously dismissed PathFinder as I was thinking it would limit us to much, but it might not be the case and we could maybe have just one backend in that case.

windexlight commented 4 years ago

I’ve been digging through their source a bit and haven’t yet identified where the pathfinder-related code is. I’m missing something, and need more time to dig in. They do have issues opened that mention no compatibility with RPi3, implying that OGL ES2 is not supported.

windexlight commented 4 years ago

Found it, I just wasn’t looking hard enough. It does indeed appear to be a Go port of the Pathfinder core algorithms. The framework on top of it might need to be massaged to fit nicely into Fyne, but certainly seems worth experimentation. Whether it is itself limiting or not, I think the goal should be enabling better performance in hardware that supports it, while having a fallback for hardware that doesn’t.

Bluebugs commented 4 years ago

Oh, what file is it, will save some time. An easy solution would be to take pathfinder to its own independent project and use that directly.

windexlight commented 4 years ago

It’s in: https://git.sr.ht/~eliasnaur/gio/tree/master/app/internal/gpu/path.go

Some other related things in the same module.

Jacalz commented 3 years ago

Which canvas primitives are left after line (fixed in https://github.com/fyne-io/fyne/pull/2217) and gradients have transitioned to using OpenGL? I think it could be good with a list that where we can mark them as completed when support is added in.

andydotxyz commented 3 years ago

How about the following:

[x] Line
[x] Rectangle With Stroke
[ ] Circle
[ ] LinearGradient
[ ] RadialGradient

I think that is all, Image and Raster can't avoid going through the pixel stage. And text is a work in progress for other reasons.

windexlight commented 3 years ago

Out of curiosity, does the new line drawing implementation include general quadratic or cubic Bezier paths (stroked or filled), or is it just for straight lines? If the latter, it seems like the list here should include Bezier paths.

andydotxyz commented 3 years ago

We don’t have curves or paths in the toolkit yet, I thought this ticket was just about moving to OpenGL. Other tickets track requests for more, like #366

fyne-io / fyne

Draw canvas primitives using OpenGL? #563