Closed Ryan1729 closed 8 years ago
The code changes look ok to me. Does this increase the size of the cache much? I'm not super familiar with this code, but it looks like the cache won't get evicted so if this kicks up the memory usage by a lot that might be a concern.
One thing with performance improvements is that they can be quite unintuitive. So can you provide proof (in the form of benchmarks) that this works as expected?
Here's a quick benchmark I put together based on the cube example.
To use it: Run it until the cubes stop moving, then the score is printed in the console. You'll want to discard the first result and reload the page to allow things to "warm-up". Higher scores are better.
When I run it on my 6 year old laptop with integrated graphics I get scores in the range 150-170. When I swap in the caching version I get scores in the range 230-250.
The size of the heap isn't measurably different, at least in the case of the benchmark with its single shader.
The extra memory usage would be something like O(p * u) where p is shader programs used and u is the uniforms used in each program, but I haven't seen a shader with even a dozen uniforms. If there's an elm program that uses lots of different shaders, I suppose it might be worth trying it on that?
That benchmark gives me 1800-2100 for the uncached version and 2400-2500 for the cached version (AMD Radeon R9 M370X graphics card). So the improvement is more impressive for weaker graphics cards, but it's definitely there.
I think this looks good. I'd like someone else to have a look too, since I think this technically doesn't tick all the boxes of the CONTRIBUTING document. @mgold: What do you think, is this admissible?
Heh, this is mostly beyond me. But if you've managed to improve performance on some machines without regressing others, haven't introduced runtime errors, and haven't changed the API, this seems fine to me.
Cool. I've updated the version number and fixed the bounds as suggested in #22
Ok, so is this ready to merge Ryan? :)
Yes. All set. :)
gl.getActiveUniform
andgl.getUniformLocation
do not need to be called every frame and can sometimes be very slow, (not sure if it's just my integrated graphics card or something.)This pull request caches the result of these calls in a manner similar to the way the author of the webgl fundamentals tutorial's utility library does.