Closed lukasmican closed 4 years ago
How many FPS for this statement "iPhone 6 with iOS 11 it runs well"? Can you do some debugging, what seems to be slowing it down? Any guesses?
With that little information will be too hard to solve your issue (or know if is a bug at all), try to identify the source of the problem and add a minimal example so others can test.
Faced the same problem during migration from Godot 2 to 3. Instruments shown me very large delays on glBufferSubData calls. This is a known bug on iOS: https://forum.juce.com/t/optimizing-opengl-renderer-for-ios/13172 In Godot it has a simple workaround: replacing to glBufferData.
In my project it happens on 9-patch images rendering. This patch works:
diff --git a/drivers/gles2/rasterizer_canvas_gles2.cpp b/drivers/gles2/rasterizer_canvas_gles2.cpp
index 9eb159cb3..e760b4e43 100644
--- a/drivers/gles2/rasterizer_canvas_gles2.cpp
+++ b/drivers/gles2/rasterizer_canvas_gles2.cpp
@@ -832,7 +832,7 @@ void RasterizerCanvasGLES2::_canvas_item_render_commands(Item *p_item, Item *cur
}
glBindBuffer(GL_ARRAY_BUFFER, data.ninepatch_vertices);
- glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * (16 + 16) * 2, buffer);
+ glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, buffer, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, data.ninepatch_elements);
0001-Slow-glBufferSubData-workaround-on-9-patch-images-re.patch.txt
cc @clayjohn
@ivanarh The link you sent also says that a workaround is to add glBufferData (GL_ARRAY_BUFFER, (GLsizeiptr) sizeof (vertexData), NULL, GL_DYNAMIC_DRAW);
before the call to glBufferSubData
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, NULL, GL_DYNAMIC_DRAW);
#endif
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * (16 + 16) * 2, buffer);
Or, if it is working well on your device just replace the calls to glBufferSubData
with
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, buffer, GL_DYNAMIC_DRAW);
#else
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * (16 + 16) * 2, buffer);
#endif
Unfortunately, I don't have an apple device to test on. So I can't really say which option would be better. But if you could test it out and make a PR that would be great!!
@clayjohn Should this be done in all glBufferSubData
calls in that file, or just on one specific call? (I don't have an iOS device to test this either.)
is the fix going to find it's way into a future release of the engine?
@davidholiday We need someone to test the changes mentioned by @clayjohn (this requires compiling iOS export templates). If the changes are proven to improve performance on actual iOS devices, then we can consider merging it.
@davidholiday do you have an IOS device and an IOS compile chain so you can test? That's all we need to move forward with this fix.
@Calinou @clayjohn I can try compiling the iOS export template and see if it ameliorates the issue I'm seeing with my game on iOS devices. What branch should I pull down?
You'd have to pull from master, then make the changes described above.
I think that should pull from 3.1 branch if it is a current project, then make the changes, master may have some differences with 3.1.
I thought you guys had already made the changes and just needed someone to test them out. I'll try and get to this when I can but it won't be right away
I'm currently testing on an iPhone running into similar FPS problems. I'm running at the max 60 fps on a laptop with a basic intel integrated gpu, but when I test it on my iPhone I get 25 fps which makes the game look incredibly choppy. I would like to test out these gl buffer changes, but I'm not sure where I need to make the modifications. I'm running the latest 3.1 Godot release and using GLES3. When I switch to GLES2 (which disables some of the particle and shader effects), I get closer to 40... but I was definitely expecting at least 60 frames per second.
@blaeberry Look inside drivers/gles2 and drivers/gles3 and ctrl+f for "glBufferSubData" and then add the following code segment before it.
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, NULL, GL_DYNAMIC_DRAW);
#endif
that sizeof(float) * (16 + 16) * 2
might need to change for each call it should match the corresponding call to glBufferSubData
below it like so:
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, NULL, GL_DYNAMIC_DRAW);
#endif
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * (16 + 16) * 2, buffer);
Thanks for looking into this! We need all the help we can get for iOS as there are very few of us with iOS devices.
So I did my best to test it out... since I think the drivers directory only exists in the precompiled version on mac, I added all of the glBufferData code to the gles2 and gles3 cpp files for the 3.1.2 build, copied the export templates from my previous 3.1.1 build, then compiled it, exported the project using that .app version, and tested the export on my iPhone through xcode. Long story short, I did not notice any performance gains... the frames per second are just as bad. I wasn't able to test GLES2 unfortunately, but I doubt it would get me anywhere. Should I try building again using the #else #endif alternative?
You need to build new export templates as well as the code I gave you is only used when building for iOS. Likely the test you ran didn't even include that code. Instructions for compiling for IOS can be found in the docs. http://docs.godotengine.org/en/latest/development/compiling/compiling_for_ios.html
I suppose that makes sense considering there is an IPHONE_ENABLED flag in the C++ code.
I've been looking at the documentation for a while, and I can't figure out a way to actually produce an export template from the binary. The "Compiling for Android" shows how to produce the apk files, but the iPhone template file is actually a zip, whereas the produced binary in the "/bin" directory has the ".a" extension, meaning archive, I believe... I also looked at the "Exporting for iOS" documentation which just says to download the export templates...
I tried something a bit hacky and copied over the 3.1.1.stable/iphone.zip export template, unzipped it, and replaced the debug ".a" file with the one I compiled in the "/bin" directory, zipped it back up and exported everything to xcode the usual way, but the build failed... (I just started using xcode as well, so I'm not familiar with debugging failures)
Honestly I've only used the engine to work on my games, so I don't have much of an understanding as to how all of this backend stuff works. I am confused as to the relationship between these /bin files and the export templates... I was under the impression that the compiling was to produce a Godot Engine app you can run on your machine as an alternative to just downloading the precompiled application, but now I'm trying to use that binary as a part of an export template? Frankly, I'm not even sure if it's okay to export the project using my precompiled macOS version application assuming I use the iOS precompiled export template. Is it?
Anyway, I'm fine to keep working on this, but you'll have to excuse my naivety. Basically, I don't know what to do with this ".a" file...
See https://docs.godotengine.org/en/latest/development/compiling/compiling_for_ios.html
Edit: Oh right it lacks instructions on how to zip the template. I'll add that.
Any updates on how to zip the template?
Here's how it's done on the official build server:
# iOS
cp -r git/misc/dist/ios_xcode ios_xcode
cp out/ios/libgodot.iphone.opt.fat ios_xcode/libgodot.iphone.release.fat.a
cp out/ios/libgodot.iphone.opt.debug.fat ios_xcode/libgodot.iphone.debug.fat.a
chmod +x ios_xcode/libgodot.iphone.*
cd ios_xcode
zip -q -9 -r ../templates/iphone.zip *
cd ..
rm -rf ios_xcode
git
here is a clone of https://github.com/godotengine/godot
Okay I see what's going on here... you're copying these files into a new directory, zipping it all up, and placing that in the templates directory...
There are a few things here that I am unclear about and want to verify: what is the out/ directory, and where were these libgodot.iphone files produced?
In the Godot source directory where I ran scons p=iphone target=debug
, I have the /bin/libgodot.iphone.debug.arm64.a and /bin/libgodot.iphone.opt.arm64.a archive files, and I also have _misc/dist/iosxcode/libgodot.iphone.debug.fat.a and _misc/dist/iosxcode/libgodot.iphone.release.fat.a files.
Should I use any of these files in place of the out/ios/libgodot.iphone.opt.* files?
If you only want arm64 support, you can use the files you compiled in bin
. The ones in the template are empty placeholders that should be replaced.
So a big step forward... I managed to get it all compiled and working on my iPhone XS with the latest iOS using the lipo where I compile the arm64 and the armv7...
However, now I am getting a 502 error every frame (seems to be related to https://github.com/godotengine/godot/issues/8739) as well as a weird white triangle artifact... my performance doesn't seem to be better really, but I'd want to get the other issues fixed first to really tell. I'm running on the 3.1.2 devel build of Godot.
I'm pasting the code changes I made so you can take a look if you deem it relevant. Let me know if there's anything else you want me to particularly try... I really want to get the iPhone performance improved so I can hopefully make a successful iPhone app for the engine and Godot can get more attention and contributions. :)
Great news!
I got my game working at 60 fps (max) on my phone - up from about 24 fps. I went back to the 3.1.1 stable version of Godot and made sure to match the buffer type (first argument of the glBufferSubData function) as well as the size. I'm also no longer getting those 502 errors...
The gles2 and gles3 folders with changes: gles_changes.zip
Below is most of the coding I did to get the functioning iPhone template. You might also be able to use some of this to help you update the "Compiling for iOS" page.
^((\s*)glBufferSubData\((.+),.+,(.+),.+\);)
$2#ifdef IPHONE_ENABLED\n$2\tglBufferData($3, $4, NULL, GL_DYNAMIC_DRAW);\n$2#endif\n$1
# Build the binaries necessary for recent iPhone and iOS
scons p=iphone tools=no target=release arch=arm &&\
scons p=iphone tools=no target=release arch=arm64 &&\
lipo -create bin/libgodot.iphone.opt.arm.a bin/libgodot.iphone.opt.arm64.a -output bin/libgodot.iphone.opt.universal.a &&\
scons p=iphone tools=no target=debug arch=arm &&\
scons p=iphone tools=no target=debug arch=arm64 &&\
lipo -create bin/libgodot.iphone.debug.arm.a bin/libgodot.iphone.debug.arm64.a -output bin/libgodot.iphone.debug.universal.a
# Change access and copy to dist folder
chmod +x bin/libgodot.iphone.*
cp bin/libgodot.iphone.debug.universal.a misc/dist/ios_xcode/libgodot.iphone.debug.fat.a
cp bin/libgodot.iphone.opt.universal.a misc/dist/ios_xcode/libgodot.iphone.release.fat.a
# Zip up the iPhone template
mkdir templates/
cd misc/dist/ios_xcode/
zip -q -9 -r ../../../templates/iphone.zip *
# Then manually copy zipped template to relevant template directory for Godot exporting
Anything else you would like me to do?
@blaeberry That looks amazing!
Just to confirm, you are still having trouble adding this to master? Or are you able to get it working in both the 3.1 branch and master?
@clayjohn
I just tried again with the latest version (3.2), and it failed when trying to build using xcode...
I started from the beginning after pulling from master, so I'm pretty sure I did everything correctly.
I got the following errors:
Showing All Issues
Undefined symbol: _z_error
Undefined symbol: _OBJC_CLASS_$_ARSession
Undefined symbol: _z_verbose
Undefined symbol: _OBJC_CLASS_$_ARWorldTrackingConfiguration
Hmmm AR stuff... Better bring in @BastiaanOlij
I'm guessing that is because the ARKit framework isn't being compiled in, seeing the ARCore code is part of the build unless you exclude the module when compiling it needs to be linked with the framework included even if you're not using it.
Xcode is pretty annoying that when you turn a component off it also starts removing the included frameworks and you need to manually add them back in. It should work correctly when you export the project anew from Godot but if you change things in Xcode or reuse the Xcode project you've been using there is a good change the settings are removed or missing.
@blaeberry - thanks. I reproduced your steps verbatim and the summarized instructions were of great help.
Plug: if you need more iOS testers (like me), please consider easying the build of the distro for iOS by documenting it better. Compiling the templates is a must have for iOS testing and a challenge to do by a newcomer...
I tested on 3.1.1 because compiling the master branch gave me the same issues with unresolved ARKit symbols (so @BastiaanOlij -- please explain how to link with the ARKit XCode component or how to compile the distro without ARKit - thanks in advance).
I tried both versions of the suggested changes by @clayjohn :
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, NULL, GL_DYNAMIC_DRAW);
#endif
with the Sublime regex find/replace patterns as advised, and
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, sizeof(float) * (16 + 16) * 2, buffer, GL_DYNAMIC_DRAW);
#endif
with the same patterns adjusted.
Unfortunately, both don't work for me and hang the GPU. I tested on iPad Pro (1st gen), iPad Pro 3rd gen, iPhone SE - all of which run iOS/iPadOS 13.1 and 13.2. The trace goes like this:
OpenGL ES 3.0 Renderer: Apple A9X GPU
...
Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
Execution of the command buffer was aborted due to an error during execution. Caused GPU Hang Error (IOAF code 3)
GLDRendererMetal command buffer completion error: Error Domain=MTLCommandBufferErrorDomain Code=2 "Caused GPU Hang Error (IOAF code 3)" UserInfo={NSLocalizedDescription=Caused GPU Hang Error (IOAF code 3)}
Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (IOAF code 4)
...
Terminating due to blacklisting by kernel driver
Without these changes, my game runs fine with the old code on iPhone SE and iPad Pro 1st gen at 60fps.
However, on the iPad Pro 3rd gen (which I expected to fly as a bird), it runs at 20-23fps (same build!), and on this device XCode warns the following after initialization:
Main Thread Checker: UI API called on a background thread: -[UIApplication applicationState]
PID: 980, TID: 492114, Thread name: com.apple.CoreMotion.MotionThread, Queue name: com.apple.root.default-qos.overcommit, QoS: 0
Backtrace: ...
This is clearly another issue, but I am reporting it here FYI along with the lousy performance on the latest Apple hardware.
To sum up - I got devices to test with but I am not a Godot insider and I'd rather focus on the game than on the bugs :) but if there's anything I can help you with, please bear with me and make a sound test request. I am willing to spend time before 3.2 is out to improve iOS support as much as possible. Thanks.
Just out of interest, do you export from Godot anew or do you use the xcode project created when you started with your project?
When going to a newer version of Godot you have to always rebuild the export templates and use them or you won't get all the changes that are applied to the xcode project (such as adding the arkit framework). I did a video on that some time ago: https://www.youtube.com/watch?v=6NEonfH1ME0
As for compiling without the ARKit module included, there is a switch you can add to the scons command where you can include/exclude modules. I'm not entirely sure of the syntax though, I think its module_arkit=no There is more info about that here: https://docs.godotengine.org/en/3.1/development/compiling/optimizing_for_size.html
CoreMotion btw is the framework that is used to read accelerometer and gyroscope data. Wonder why it's complaining about that. We poll the values every frame, maybe its not happy about that on iPads?
Im also wondering if this could be a build with debug vs release problem as well.
@clayjohn very possible, I haven't had any time to dive into this and see if I can reproduce any of it, it is definately possible there is a behaviour difference between the two. I had someone contact me through Twitter who was having issues publishing an App to the AppStore that was working when debugging. Maybe there is something missing in the release build
@BastiaanOlij - thanks for all the info. I am always exporting the project anew before launching Xcode. The project .ipa is created with the Custom Package Debug/Release pointing to the template/iphone.zip which is built with the script above, and with the Export With Debug option checked.
Re: Arkit - I noticed in your video that the Arkit Capabilities checkbox is enabled, while in mine it is disabled (I am not sure what the default value is initially) - this is most probably the reason for the Arkit missing dependencies, and I will check on this now.
Re: CoreMotion, the puzzling bit is that the same Xcode project, deployed on Ipad Pro1 (Model A1673) and Ipad Pro3 (Model A1980) results in different behavior. Pro1 runs at 60fps and without the Thread Checker warning, while the Pro3 runs at 23fps with the Thread Checker warning. It looks like the threading models (pthread libs) on the two devices are different. Are you reading the accel/gyro data every frame from the main thread? Googling on that Thread Checker warning invariably suggests one shall wrap the offending API calls with
DispatchQueue.main.async {
// ... API call here.
}
and this could be the major reason for the bad game performance, even more that the gles2/3 optimizations. I have also the same 23fps bad performance on an Android Samsung A40 phone (which is a recent mid-ranger), while the same .apk on other Android devices runs at 60fps stable. All this suggests a core system issue.
That said, what would be a sensible test plan here? I guess I will focus on the master branch (beta1 as of this writing) since 3.1.1 is out of scope.
Update: retested everything on master 3.2.beta. Raised Arkit issue 33406. Same results, GPU hang, DrawView: 501 error and blacklisting by kernel.
I tested the 3 configurations suggested in this thread (replace, prepend with buffer, prepend with null):
)
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, size, buffer, GL_DYNAMIC_DRAW);
#else
glBufferSubData(GL_ARRAY_BUFFER, 0, size, buffer);
#endif
2)
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, size, buffer, GL_DYNAMIC_DRAW);
#endif
glBufferSubData(GL_ARRAY_BUFFER, 0, size, buffer_ptr);
3)
#ifdef IPHONE_ENABLED
glBufferData(GL_ARRAY_BUFFER, size, NULL, GL_DYNAMIC_DRAW);
#endif
glBufferSubData(GL_ARRAY_BUFFER, 0, size, buffer);
Xcode stops at
gl_view.mm, line 490: [context presentRenderbuffer:GL_RENDERBUFFER_OES]; = Thread 1: signal SIGABRT
I guess this settles it, unless I am doing something totally wrong, I wonder how @blaeberry got his iPhone XS working with these changes. Which iOS version? I am using the latest Xcode and running i(Pad)OS 13.2.
I am having the same problem on my two iOS 12.4.3 devices (iPhone 5S and iPad Air (1st gen)), with Godot 3.2.beta1.official, where in 3.1.1.stable.official it was fine. I use the latest Xcode version on MacOS Catalina.
I can add that I have bad performance too on MacOS itself while 3.1.1.stable.official was fine
Good news here - I got a 10-30% (6-20fps) speedup across my mobile devices, compensating for the performance drop of the current 3.2 beta master compared to 3.1.1 stable,
It is based on the initial hunch by @ivanarh to replace glBufferSubData
by glBufferData
. The hack suggested by @clayjohn intended to orphan the portion of the array buffer written to by glBufferSubData
by prepending a glBufferData
call with a null pointer doesn't work, at least on Apple devices cf. my previous tests.
Here is the rationale of what I did:
It is well known that glBufferSubData
potentially creates CPU/GPU sync points which, depending on the GL driver implementation, can ruin performance. A short summary of the problem with more info can be found here. So I reviewed carefully the 60 places in the code where we currently use glBufferSubData
in the gles2 and gles3 drivers. It turns out that we can safely replace those calls with glBufferData
in a number of places. Safely means the currently used glBufferSubData
must respect the following 3 rules:
glBufferSubData
calls in the bind/unbind cycleThis set of rules isolate code patterns like this:
glBindBuffer(GL_xxx_BUFFER, buffer); // bind
...
glBufferSubData(0, size, buf_ptr); // upload data
...
[eventually draw]
...
glBindBuffer(GL_xxx_BUFFER, 0); // unbind
Replacing the glBufferSubData
with a glBufferData
in these cases safely removes a potential CPU/GPU sync point because we allocate a new storage with the same size, while the old storage may already be scheduled for draw by the driver which is in charge of freeing it when it is done. We can thus operate on the new storage in parallel. And this is exactly one of the cases in the ninepatch logic submitted by @ivanarh.
The attached patch is the result. 13 locations are affected, I have commented the glBufferSubData
code line and added the equivalent glBufferData
below to ease review by the Godot heavyweights.
I initially wrapped the changes with IPHONE_ENABLED
, then extended it to ANDROID_ENABLED
, but after some testing which showed systematic performance improvement across the mobile devices I ran my game on, I decided to remove those conditions and suggest it as mainstream code for all platforms. Truth be told, I just can't think of a scenario where this change at those places could hurt performance, but I may be wrong. Besides, you know the code better than me and are likely more experienced at profiling the renderers with your own methods, examples and tools, So I leave it up to you to followup. PR and such is too much for me at this point, but given the apparent benefit I can only recommend it for your consideration :)
Note that this change is unrelated to the observed slowdown of the current 3.2 beta compared to 3.1.1 because this area of the code hasn't been touched or reorganized that much. So it is a good idea to figure out why we got a slowdown in the first place, this speedup notwithstanding. glBufferSubData_patch.zip
@oeleo1 Thanks for your hard work! Great to hear that the suggested fix works. You have actually come full circle to the fix originally suggested by @ivanarh in https://github.com/godotengine/godot/issues/23956#issuecomment-497340805
You seem to think that us Godot developers have some special knowledge about the OpenGL code. The truth is we don't, I was happy to change glBufferSubData() calls to glBufferData(), I just needed to see from someone who owns an Iphone that there actually was a performance boost.
Since you've done all the work identifying places for the fix and testing the code I'd love if you would make a PR as well. It's great to see new contributors giving back and I want you to be able to put your name on a commit. If you aren't comfortable with the PR process let me know and I can help you out. :)
Oh, pity you are not heavyweights... :) and thanks for trying to drag me into the dev like a game core loop :) I'll gladly let you handhold my hand for the PR later on (thanks for the help in advance btw, so send my way the relevant info) but right now I'd rather ask you to act as a proxy on this one so that the optimization gets in sooner and gets tested by a broader audience.
I don't feel particularly rewarded seeing my pseudo in a commit, but I do when I have fun with Godot. And right now I am more concerned that I can't recognize my game in 3.2. I am severely hit on the fun part, Half of my tweens don't start, it is slower, everything is displaced. So I am more interested in spending time finding what's wrong and fix it rather than get up to date with the open-source process. I had my times of glory in a previous life (I wrote for instance Python's object memory allocator so yes, you could say I have some sensibilities in mem/buffer management) but I'm far from aspiring for that. Please go ahaid with the changes after a best-effort review ;) and let's not worry about who authored the contribution.
Good News Part 2 - rock solid 60 fps across all my Apple and Android devices!
That's unseen before on some of the devices which were running at 20 fps before Good News Part 1, and at 30 fps max after it. Spectacular! I this goes in, I think 4.2 will be remembered as the fastest release to date. Hopefully no more random fps on random hardware :)
So what's the story this time? Well, it's the same story.
glBufferSubData
is performance evil - confirmed by profiling with Instruments. This function is a performance killer, especially on mobile. In Apple's case it is calling all kind of OS sync objects under the hood and that's no surprise, because the drivers on mobile have more constraints than their analogs on desktop platforms. They need to care about energy, manage more efficiently less memory and what not. So when you use glBufferSubData
you brute force them with constraints and they strike you back with delays.
Here I addressed the vast majority of the calls to glBufferSubData
at runtime. Depending on the application, different areas of the code are involved, but the hot spot is invariably the canvas rendering, in the _draw_polygon()
and _draw_generic()
functions.
The solution is what @clayjohn suggested initially, which is to orphan the buffers beforehand with a call to glBufferSubData
with a NULL
pointer. However, a crucial difference here is that we can't orphan a portion of the buffer before writing to it - that doesn't work (tested) - but we can orphan the whole buffer by passing the size it has been originally allocated with. And we do this before the string of glBufferSubData
calls that follow, each adding it's own bit of vertices. This works like a charm.
So we get the performance boost by orphaning the polygon buffers, by reallocating them at their max size specified during initialization. We are not checking for GL_OUT_OF_MEMORY
just like the rest of the code. That's fine - in theory - since if/when that happens we will be using the old buffers. And if that's not fine, there's no way out anyway (what do you do when malloc()
fails?). Testing my game didn't reveal any problems, but then again I don't have 100 canvas layers. I have 2 or 3.
I am pretty sure orphaning the buffers to the initial/current size could be applied to more places in the code, but I think it's not worth going that far at this time. We have resolved the main problem here. If someone complains about particle performance, lights, skeletons or anything else causing unusual fps drops, we may dig further.
The attached patch No 2 addresses the polygon and generic draws in gles2 and gles3 against master. It also fixes what I believe is a bug in the gles3 _gui_draw_primitive()
. where the size parameter of glBufferSubData
was not multiplied by sizeof(float)
.
Enjoy and spread the love, and I can go back to having some fun again... :) glBufferSubData2.zip
@oeleo1 This looks great! I think it may be best to only apply these changes to mobile devices unless you are also seeing a performance improvement on desktop? I think the way desktop drivers handle synchronization is a little more robust and so it is better to not orphan the buffer.
Related issues that may benefit from similar changes https://github.com/godotengine/godot/issues/25400 and https://github.com/godotengine/godot/issues/19943 (this one should be fixed already by oeleo1's patch)
Applying to mobile only is wise, but actually your worries are also mine. I am happy with the idea of orphanig, but I am not happy with the idea of orphaning the whole poly buffer for 2 reasons:
1, We overallocate storage too much for every polygon (128k poly + 128k poly index = 256k)
We currently don't have these problems because there is only one buffer, allocated at initialize time, and the driver is doing implicit synchronization if the poly buffer is scheduled for drawing, which causes the delays we are fighting here.
A more intelligent way of doing things is to combine the best of both worlds: overallocate no more than what we currently need and keep control of the process. We could achieve this by doing the following:
We thus fallback to dynamic optimal poly size management as plan A by keeping things under control with the fallback to plan B. I'll try to come up with a new patch to see if that flies - I expect the code to be a slightly more hairy though (patch2 is excruciatingly simple and suboptimal :).
@oeleo1 That sounds like a very logical strategy. If I understand correctly you are suggesting something similar to how reduz does it in the vulkan branch. An optimization like that would be very welcome for 4.0. For 3.2, I think we should stick to patch2 applied to mobile devices only as it is very simple and unlikely to cause any problems before release.
Let's hold our breath for a moment to see patch 3 then make up our minds. As for Vulkan, I am not acquainted with what's going on there but If reduz is doing on demand allocations with a plan B he's on the right track. Vulkan allows for more fine grained control all over the place. OpenGL is a gamble compared to it.
Here is a first shot at on-demand buffer allocation with fallback to preallocated buffers if something goes wrong (for gles2 and gles3). Like I said, the code is more hairy but it actually seems to work even better. It may be reworked to look nicer but the idea is there. However this is more complex and untested, so too risky for 3.2. You can look at this and followup with better ideas.
BTW, during this exercise I noticed that the APIs for draw generic are inverted wrt the last two parameters: one is bones and weights, the other is weights and bones. Is that intentional or historical?
As of now, I would stick with patch 2. To the question whether mobile only or mobile and desktop - it doesn't matter much from a desktop standpoint (I haven't measured the performance but it can't be worse). If it's also on desktop, chances are it will be debugged sooner if something's wrong. The important thing here is that the performance boost on mobile is very substantial. Up to you :) glBufferSubData3.zip
one is bones and weights, the other is weights and bones. Is that intentional or historical?
Most likely a mistake.
As of now, I would stick with patch 2. To the question whether mobile only or mobile and desktop - it doesn't matter much from a desktop standpoint
I left it just for mobile and web for now. When we are rewriting the GLES2 renderer for 4.0 we can spend more time ensuring that it doesn't cause any regressions on desktop.
@oeleo1 your work sound fab here - I'm having similar issues. How would I go about applying your fixes in https://github.com/godotengine/godot/issues/23956#issuecomment-552337809 ?
@njt1982 assuming you already know how to build Godot.
git fetch upstream pull/33527/head:glbuffersubdata_optimization
git checkout glbuffersubdata_optimization
#build using scons
I have already applied his patches in https://github.com/godotengine/godot/pull/33527
assuming you already know how to build Godot.
I dont, but this points me in the right direction, thanks!
UPDATE:
I have it compiling - I had to do git fetch origin pull/33527/head:glbuffersubdata_optimization
(instead of upstream).
Thanks @clayjohn - I'll report back here if its helped the FPS on my iPhone game.
Godot version: 3.0.6
OS/device including version: iOS 12 (iPhone 5s, 7, SE)
Issue description: I am writing a simple 2D game for mobile platforms with Godot 3. On Android it runs just fine with stable 60 fps, but on iOS I am getting around 20 fps with the same code. I have tested it on several iPhones and I have found out that on iPhone 6 with iOS 11 it runs well, but on iPhone 5s, 7 and SE with iOS 12 it seems laggy and I guess it runs around 20 fps.