Open adamwulf opened 8 years ago
this feels very similar to #1548, which was fixed by combining multiple [UIView animations] into a single block. I suspect something similar is going on with screen updates. Possibly with adding the new scrap + gesture and moving it so quickly after adding, or possibly with the OpenGL context of the new scrap.
I can test by removing the JotViews of the scraps to see if I can repro w/o the new OpenGL contexts getting created. That'd help narrow down if its something i'm doing wrong w/ OpenGL vs something wrong with the UIView itself.
I've timed the section where the scrap is added, and the slowdown is somewhere inside of:
[MMScrapView addScrapWithPath:(UIBezierPath*)path andRotation:(CGFloat)rotation andScale:(CGFloat)scale];
narrowed further into [loadScrapStateAsynchronously:]
Throttle how fast scraps can be made, and just continue the stretch gesture until time is ready.
Hopefully I can detect when the delay would otherwise happen, then I can continue the stretch gesture until it's ready again, but throttling is a backup plan if I can't detect with certainty.
So the issue is glReadPixels is sometime extremely slow. The only answer i believe is to upgrade from OpenGLES 1.1 to either Metal or ES 2.0. I've emailed Brad Larson to see if he is open to doing some contracting. If he doesn't respond, I'll email again and ask for references.
As a work around, maybe I can copy the assets on disk and load them into OpenGL that way instead of saving them out from OpenGL all over again.
This might also help prevent how ink quality fades after many clones
I think the issue to resolve it is to:
this way, the glReadPixels won't lock the CPU while waiting on the GPU to copy the pixel data, though it'll change how and probably when i can process that data
On the contrary, glReadPixels() with PBOs (pixel buffer objects) can schedule asynchronous data transfer and returns immediately without stall. Therefore, the application can execute other processes right away while transferring the data by OpenGL at the same time. The other advantage of using PBOs is the fast pixel data transfer from (and to) a graphics card though DMA without involving CPU cycles. In the conventional way, the pixel data is loaded into system memory by CPU. Using a PBO, instead, GPU manages copying data from the frame buffer to a PBO. This means that OpenGL performs a DMA transfer operation without wasting CPU cycles.
To maximize asynchronous read back performance, you can use two PBOs. Every frame, the application reads the pixel data from the framebuffer to one PBO using glReadPixels(), and processes the pixel data in the other. Calls to glMapBufferARB() and glUnmapBufferARB() will map/unmap the OpenGL controlled buffer object to the client's address space so that you can access and modify the buffer through a pointer. These read and process can be performed simultaneously, because glReadPixels() to the first PBO returns immediately so CPU can start processing data in the second PBO without delay. You alternate between the two PBOs every frame.
scratch the above, it seems PBO's are only available in OpenGL ES 3.0. However Brad Larson uses texture caches to quickly load data in/out of textures
and
http://stackoverflow.com/questions/10455329/opengl-es-2d-rendering-into-image/10455622#10455622
and now we've gone full circle - OpenGL ES 2.0 is required according to docs for CVOpenGLESTextureCacheCreate
, so maybe #1568 is needed after all
xcdoc://?url=developer.apple.com/library/ios/documentation/CoreVideo/Reference/CVOpenGLESTextureCacheRef/index.html#//apple_ref/c/func/CVOpenGLESTextureCacheCreate
step by step: http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/
The OpenGL ES 2.0 refactor in #1568 is finished, so using CVOpenGLESTextureCacheCreate to load/save textures much faster should be possible now
I tried using a simple input/output from GPUImage, but since i'm using single size of backing textures for everything, i think it's having trouble scaling things properly if the frame buffer + texture + output aren't all the exact same size.
An alternative option: build a GPUImageFramebuffer and then render to that frame buffer and fetch its bytes directly. that way i can piggy back on the fast texture export w/o going through additional render cycles w/ an input + output filter.
I think I need to use the fast text cache code from [GPUImageFramebuffer generateFramebuffer] (which was taken from (http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/) and make my own FastTextureBackedFramebuffer that I can render to then pull the contents from.
I'm punting on this till 2.1. I'm not convinced the delay is because of texture caching - I think it's something else that I'm doing wrong in OpenGL. I need to get someone else to review my code and advise on how to clean up the OpenGL
I had another theory about what to do. When implementing the texture cache code, i noticed that it still used a call to glFinish(). that call seems to sync the GPU/CPU and it may take multiple seconds if the GPU has tons to do on other contexts / thread as well. My problem doesn't seem to be the time it takes to pull the texture data out, it seems to be the time it takes to sync the GPU/CPU before the read even starts.
I did some more reading tonight, and it seems that if I use glReadPixels into a pixel buffer object, then the glReadPixels won't trigger a GPU/CPU sync, and I can remove my glFinish call, and it will move the pixels into the PBO asynchronously. Importantly, I can use a Fence object after the glReadPixels call. That fence will signal after the glReadPixels call has finished on the GPU, which means the texture data will be ready and sitting in the PBO.
Then I can do the texture read 100% asynchronously, and use the testFence feature to determine if the fence has been signaled without blocking the CPU at all. That'd let me be able to keep the UI responsive while waiting for the texture to be read from a cloned scrap - I could even block its interactions and show a dimmed spinner etc until the clone has completed. That'd be a much better to show a spinner + responsive UI for 5 seconds instead of completely locked UI for 5 seconds.
links with info about fences:
http://stackoverflow.com/questions/15137020/using-fence-sync-objects-in-opengl
This page has a fence example at the bottom of the page:
More good discussion here as well: https://www.opengl.org/discussion_boards/showthread.php/171319-glFlush-or-glFinish-with-mulithreading
If i switch to using CVOpenGLESTextureCacheCreate, hopefully that delay from the glReadPixels will disappear