elm-community / elm-webgl

Moved to elm-explorations/webgl
http://package.elm-lang.org/packages/elm-explorations/webgl/latest
BSD 3-Clause "New" or "Revised" License
95 stars 22 forks source link

Cache getActiveUniform calls #20

Closed Ryan1729 closed 8 years ago

Ryan1729 commented 8 years ago

gl.getActiveUniform and gl.getUniformLocation do not need to be called every frame and can sometimes be very slow, (not sure if it's just my integrated graphics card or something.)

getactiveuniformisslowsometimes getuniformlocationisslowsometimes

This pull request caches the result of these calls in a manner similar to the way the author of the webgl fundamentals tutorial's utility library does.

Apanatshka commented 8 years ago

The code changes look ok to me. Does this increase the size of the cache much? I'm not super familiar with this code, but it looks like the cache won't get evicted so if this kicks up the memory usage by a lot that might be a concern.

One thing with performance improvements is that they can be quite unintuitive. So can you provide proof (in the form of benchmarks) that this works as expected?

Ryan1729 commented 8 years ago

Here's a quick benchmark I put together based on the cube example.

To use it: Run it until the cubes stop moving, then the score is printed in the console. You'll want to discard the first result and reload the page to allow things to "warm-up". Higher scores are better.

When I run it on my 6 year old laptop with integrated graphics I get scores in the range 150-170. When I swap in the caching version I get scores in the range 230-250.

Ryan1729 commented 8 years ago

The size of the heap isn't measurably different, at least in the case of the benchmark with its single shader.

The extra memory usage would be something like O(p * u) where p is shader programs used and u is the uniforms used in each program, but I haven't seen a shader with even a dozen uniforms. If there's an elm program that uses lots of different shaders, I suppose it might be worth trying it on that?

Apanatshka commented 8 years ago

That benchmark gives me 1800-2100 for the uncached version and 2400-2500 for the cached version (AMD Radeon R9 M370X graphics card). So the improvement is more impressive for weaker graphics cards, but it's definitely there.

I think this looks good. I'd like someone else to have a look too, since I think this technically doesn't tick all the boxes of the CONTRIBUTING document. @mgold: What do you think, is this admissible?

mgold commented 8 years ago

Heh, this is mostly beyond me. But if you've managed to improve performance on some machines without regressing others, haven't introduced runtime errors, and haven't changed the API, this seems fine to me.

Ryan1729 commented 8 years ago

Cool. I've updated the version number and fixed the bounds as suggested in #22

Apanatshka commented 8 years ago

Ok, so is this ready to merge Ryan? :)

Ryan1729 commented 8 years ago

Yes. All set. :)