Open mphe opened 2 years ago
Can you reproduce this when using GLES2?
cc @lawnjelly
Just tested it with draw_line
and GLES2 on Windows. With batching enabled it is slightly faster (2700 FPS), without slightly slower (125 FPS).
So, no real difference.
The circle is likely slower since 3.2.3 due to either changes to dynamic buffers (but you say you have tried the OpenGL settings) or bug fixes (fixing pre-existing state bugs for robustness can result in slowdown). It isn't batched, and I don't recommend using that primitive. I think this came up in another issue a while ago, circles or arcs, but drawing circles manually using primitives that batch should be much faster.
We could probably make the circle primitive more sensible (convert it to a series of polys on entering the VisualServer instead of a bespoke primitive, in a similar manner to the changes to polyline), but really there has been little demand - not many people seem to use it.
Lines may be similar - I would just advise use batching. Legacy (non-batching) is really only maintained to detect graphical regressions now. It is now usually much more productive to spend time fixing bugs that might prevent you using batching, rather than spend time on the legacy canvas renderer.
With lines you are referencing some issues that are dealing with a specific case - thick lines. Lines of width 1 (default) and thick lines go through an entirely different pipeline. Also anti-aliasing or not can completely change the pipeline.
To quote from #54826 :
This strange behavior occurs only when drawing lines with width > 1 (tested with width=2). With width=1, 3.4. performance is extremely good.
I wrote the below on the assumption that these were thick lines you were addressing in this issue, but I now see if your demo project you are using default thickness lines (1). I don't know what kind of lines you are using in your actual project, so this may still be relevant, also the advise about not mixing primitives also applies:
Also see the note here regarding draw_lines with thick lines, the difference may be due to the difference between the old routine and drawing as polys: https://github.com/godotengine/godot/pull/54377#issuecomment-964878722
Essentially the older method of drawing seems to be faster for single lines, but does not scale.
You seem to be having a far greater difference on windows than linux between the two drawing methods, not sure for the reason for this, maybe the drivers are better on linux, or it is more efficient at the API communication necessary on linux.
We could switch to the old routine with batching off as noted on the PR, but I don't think it would help your situation, because you presumably are using batching, and just testing with batching off.
Ideally we would just switch between the two during VisualServerCanvas::canvas_item_add_line
, but there is a chicken and egg problem here, when calling this function we don't know yet at this stage whether it will be called once or multiple times, so we optimize for multiple times. There is a slight possibility we could do the switch between the two later in the pipeline (i.e. write them all as some kind of dummy primitive or something, and defer the conversion to the actual primitive till later), but it seems a bit involved and error prone for what might be a fringe case.
A more pragmatic solution in this case would be for you to change your code so that it draws lines together, that way they should be batched and draw way faster. We worked through a similar situation with eirexe and their game, with great success. This should give you the best performance.
Essentially batching will do it's best to re-arrange things to work fast, but in some cases (particularly multiple commands within an item) the user can create a pathological situation (particularly with custom draw routines), and can benefit a lot by simply tweaking their drawing code, i.e.
Instead of drawing within an item:
Line
Circle
Line
Circle
Line
Circle
Draw:
Line
Line
Line
Circle
Circle
Circle
This way the lines will likely be batched.
The docs contain quite a bit of info on optimizing your drawing: https://docs.godotengine.org/en/stable/tutorials/optimization/batching.html
I will try and investigate more when I have some time - it is a long time since we added line batching, but it's likely the difference you are seeing is just due to dynamic buffering differences (the OpenGL settings in project settings) and changes to robustness.
In my actual project I use a lot of single-width, non-anti-aliased, 20-segment draw_polyline
and draw_polyline_colors
calls, with individual modulate values and a custom shader that accesses COLOR
. According to the batching docs, the custom shader is probably the reason why batching doesn't work well in my project. I don't use any other draw_
functions, or at least not in the same amounts and mostly for debugging purposes.
I have about 6 draw_polyline
calls per scene and try to fit as many of those scenes in the game as possible. Each draw_polyline
call comes from an individual node in that scene. I could manually try to batch those 6 calls together using a MeshInstance. I'm not sure if that actually improves the performance, but that's a thing I could try. Manually batching all draw_polyline
calls from all active instances would require a lot of work and fine-tuning and would break z-ordering.
It's very situation dependent so difficult to say anything useful. Polylines are not the same as lines, they go through a different path again, and if you are using z ordering this can affect things, and again if you are using a custom shader. It is difficult to predict what paths are being used and whether there indeed is a problem our side, without an MRP.
Also useful can be a "diagnose log" (rendering/batching/debug/diagnose_frame
).
I'll try to create a stripped down version of my project for testing. However, the issue still stands. Even without batching, 165 FPS for drawing a few lines is ridiculous, so the problem is not really project dependant.
I tested manually grouping my polylines by a) putting them all in the same _draw
function, b) using a MeshInstance. Since they are not static and need to be regenerated frequently, b) actually decreased the performance. In a static scene it slightly improved the performance. a) yielded about the same performance as without grouping, maybe minimally better, but I noticed in the diagnose log, that those 6 polylines are now batched.
As a workaround, you can draw long, thin stretched Sprites to use as lines or rely on the Line2D node (slower than Sprite, but makes it easier to draw curves). By using a specially crafted texture, this also allows for antialiasing that works well even for translucent thick lines and even if HDR is enabled in GLES3.
Edit: I released an add-on that does this for you: https://github.com/godot-extended-libraries/godot-antialiased-line2d
Line2D is too slow as it needs to be updated almost every frame.
Drawing stretched Sprites sounds interesting, but it will get ugly to make it colored like draw_polyline_colors
.
For now, I'll just live it until this either gets fixed or Godot 4.0 is stable enough to switch.
Might be worth checking how it fares in 3.5 RC 3+ now. I don't know if there were specific improvements for this specifically, but there's been rendering improvements all around so it can be worth testing.
Unfortunately, it's still roughly the same.
Godot version
3.4.2
System information
Windows 10, PopOS 21.10, GLES3, Nvidia GTX 970
Issue description
I'm developing on Linux and recently tested my project on Windows and noticed a significant drop in performance. After some debugging I identified the rendering as the bottleneck. However, the performance on Linux (PopOS 21.10) is much better, even though it's the same machine, the same project, and the same Godot version. I continued testing earlier versions of Godot and found out that the performance drop started with Godot 3.2.4 beta 1 and is therefore, maybe, related to the introduction of batching.
I created a small reproduction project that simply renders 200 sprites/lines/rects/circles using Sprite nodes/
draw_line
/draw_rect
/draw_circle
. It does not runupdate()
or perform any logic, it will simply draw 200 shapes of a kind once and then idle. Then I tested the performance for each render-kind, with batching on/off, on Windows 10 and PopOS 21.10, using Godot 3.2.3 (last version before the notable FPS drop) and the latest Godot 3.4.2.I noticed that Linux reaches more than twice the FPS than Windows in almost all cases. Even without batching, Linux still performs better in Godot 3.4.2 than on 3.2.3. On Windows, however, Godot suffers extreme performance drops without batching enabled. E.g. 200
draw_line()
calls in Godot 3.4.2 yield 165 FPS, while 3.2.3 still managed about 1000 FPS. In comparison, Linux reaches 1500 FPS in 3.4.2 and 1050 in 3.2.3. I could reproduce this kind of behavior on other machines, as well.So, in my original project where I have quite a lot
draw_line
calls that can't be batched, I get normal performance on Linux but really bad performance on Windows.I found #54826 and #54377 that seem to be related, but since I was testing with 3.4.2, the PR apparently didn't make any difference. As suggested in #54826, I also tried different opengl options but without success.
Below are tables with average FPS values for all the different test cases.
Godot 3.2.3 (No batching for GLES3)
Godot 3.4.2
Batching: on
Batching: off
Steps to reproduce
Node2D.tscn
draw_function_test
orsprite_test
node to test Sprite node performance ordraw_
function performance.draw_function_test
, use the "Draw Type" property to testdraw_line
,draw_rect
, ordraw_circle
Minimal reproduction project
godot-perftest.zip