Closed starwed closed 9 years ago
Awesome!
Nice idea. Let me share my thoughts on the current state of your webgl implementation:
Crafty.webgl.render()
would: bind the shaderProgram, let each entity with specific texture put their data into buffer, if texture changes the buffer gets flushed (transmitted to GPU), unbind the shaderProgram. So what goes into the buffer? You could just pack all the uniforms of each entity into the buffer instead. Another game library reduced the amount of transmitted data to the GPU by calculating the entity positions on the CPU LibGDX SpriteBatch draw call.width
and height
properties of the gl canvas. Each time before we render the whole package we do gl.viewport(0, 0, canvas.width, canvas.height)
;I've looked at transparency, and we can just sort the list of entities before rendering. (Before I started this I hoped that wouldn't be necessary, but we already do something similar for canvas.) So that shouldn't be too bad, especially if we're smart about keeping a sorted array around.
I could tell while I was coding the current stuff that there must be a better way to push that data, but for now I'm going to do things the naive slow way. Once I've implemented texture code, I think I'll have a better understanding of our requirements. Then we can try to make it smarter! :)
One thing about webgl is that it only works in very modern browsers, so we can use some performance tricks like typed arrays that would normally be off limits. That also opens up using tools like emscripten or asm.js, if we need to do any expensive computations in js.
So what goes into the buffer? You could just pack all the uniforms of each entity into the buffer instead.
@mucaho If your'e talking about uniform buffers here, they're apparently not usable in WebGL, which is based on ES 2.0. :(
If your'e talking about uniform buffers here
I meant vertex arrays/buffers.
Take your time to figure everything out, but in the end you will have to avoid uniforms:
"Uniforms are so named because they do not change from one execution of a shader program to the next within a particular rendering call. This makes them unlike shader stage inputs and outputs, which are often different for each invocation of a program stage." taken from OpenGL Documentation.
Shader stage inputs refer to vertex attributes in the case of vertex shaders. Vertex attributes is what OpenGL has in mind to pass on entity specific attributes.
Yeah, I'm slowly figuring this stuff out. :)
It looks to me like we'll want to provide some settings for the user to improve performance -- disabling partial transparency, for instance, could speed things up.
Got a working implementation of "Sprite".
I've been exploring the options for transparency. I think it would make sense to let the user toggle between three modes:
Possibly only modes 1 and 3 are worth supporting, since I'm not sure 2 would really be any better than 3.
I've started to think about how to optimize things -- mostly by batching draw events together. This is where using per-entity-uniforms fails, and you have to feed all the data in via vertex attributes. I think I see how to do this, but I haven't played with it yet at all!
What do you mean by sorting?
What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws. (Though probably we can mitigate this quite a bit by maintaining an ordered list -- then we only need to sort when something changes z-value)
So if you didn't have any sprites with partial transparency, you might prefer that we turned on the depth buffer, discarded transparent pixels, and didn't sort the entities.
Hmm can we cache the sorted data structure somehow? Like we have a sorted list, new entities have to be inserted and entities that change order (listen to z attribute change) have to be removed and reinserted. Would that speed up the performance instead of sorting the entities each time rendering is done?
On Tue, Jan 21, 2014 at 7:38 PM, starwed notifications@github.com wrote:
What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws.
— Reply to this email directly or view it on GitHubhttps://github.com/craftyjs/Crafty/issues/687#issuecomment-32914878 .
Oh, that's exactly what you wrote in one of your posts before :)
On Tue, Jan 21, 2014 at 10:46 PM, Matija Kucko mkucko@gmail.com wrote:
Hmm can we cache the sorted data structure somehow? Like we have a sorted list, new entities have to be inserted and entities that change order (listen to z attribute change) have to be removed and reinserted. Would that speed up the performance instead of sorting the entities each time rendering is done?
On Tue, Jan 21, 2014 at 7:38 PM, starwed notifications@github.com wrote:
What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws.
— Reply to this email directly or view it on GitHubhttps://github.com/craftyjs/Crafty/issues/687#issuecomment-32914878 .
I have it drawing colored squares in batches now. Yay? :) (The architecture will currently process objects in _z order, and render a batch when it encounters an entity that requires a different shader program. For sprites, that means everything with the same sprite sheet will be rendered in one big batch.)
One issue is that there's a lot more code involved with implementing a component for WebGL than for the canvas or DOM. (Mostly in setting up the shader program and related methods, and then in writing to arrays/buffers.) And the old tactic of just special casing the draw function will get pretty unwieldy.
Might make more sense to implement renderer-specific logic as separate components. (TintGL, SpriteGL, etc.) The regular Color/Sprite components could just add the appropriate one depending on what component you choose.
Got the batching working with sprites.
In a benchmark which just draws ~100 sprites bouncing around the screen, the WebGL backend seems about twice as fast as the canvas. (And there are definitely some optimizations I can do there.) So that's hopeful!
Also, it looks like it works just fine as a drop in replacement for Canvas. So it'll hopefully be easy to switch between them based on browser support.
Congratulations. We can increase performance further.
Do we really need seperate shader programs? Consider the following possibility:
uniform sampler2D uSampler
could be set to -1 with gl.uniform1i(gl.getUniformLocation(program, "uSampler"), -1)
for color mode orgl.uniform1i(gl.getUniformLocation(program, "uSampler"), texture_obj.sampler)
varying vec4 vColor
and texture fragment shader's varying vec2 vTextureCoord
could be combined into varying vec4 vInput
and if vInput.a != 0
then we do coloring else we do texturing.Shader programs with branching code is a bit harder to write if you do not have branching control structures (if .. else ..) support in the shader language. Do WebGL shaders support if & else? If not, I could show you a mathematical pattern to emulate branching.
Do we really need seperate shader programs?
I've read that having conditional branching in shaders is not necessarily a good idea, because it can slow down the execution. I don't have much experience here, of course.
It's actually a pretty likely scenario that every asset will use the same program -- everything will be sprites using the same texture. If color entities are used for bullets or particles, they'll probably all have the same z level in most games. We'll certainly put that in the documentation that I am dreading having to write. :)
There's another reason why I don't want to have one monolithic program -- it makes it harder to extend the framework. Right now there's nothing stopping someone from writing their own custom programs, which I quite like conceptually.
(Oops, hit the wrong button!)
Isn't there an issue with rendering multiple textures in one pass? So you'd have to do some context switching regardless.
I've read that having conditional branching in shaders is not necessarily a good idea, because it can slow down the execution. I don't have much experience here, of course.
Yeah exactly, that's why I suggested the "mathematical branching" approach, as the executions times are exactly the same for every shader run (which is very beneficial for pipelining thousands of shader executions, like you read correctly).
On the other hand I am sure there is an even easier way to combine the functionality without branching much.
I don't want to have one monolithic program -- it makes it harder to extend the framework
In such a case, the shader would be set to the user specified shader and changed back after user is done with his rendering (so 2 shader switches). Having only one program on our part saves us even more unnecessary shader switches for the standard, framework rendering.
More notably, custom shaders come often with custom vertex attributes, so we will have to adapt for that in the future.
Isn't there an issue with rendering multiple textures in one pass? So you'd have to do some context switching regardless.
I didn't even think about passing multiple textures. It's about removing the color program / texture program switches.
It's actually a pretty likely scenario that every asset will use the same program -- everything will be sprites using the same texture. If color entities are used for bullets or particles, they'll probably all have the same z level in most games.
That's actually a very valid point, but if we can remove the "chance" entirely and make it better in reasonable time without reasonable downsides, why not? :)
Unify both Sprite, Color and Tint rendering into one shader program
After sleeping over it: I was wrong, it would actually have a more negative performance, as the color-rendered entities would do unnecessary texture lookups.
However Tinting and Spriting are almost the same, you just pass on an additional color vertex attribute and multiply the texture color with that varying color attribute. For drawing sprites without tinting the color attribute is neutral, thus vec4(1.0, 1.0, 1.0, 1.0)
. It's also fine to have seperate programs for tinting and spriting: How many users will use tinting? If you don't need tinting then sending a vec4 to the gpu is a waste of bandwith.
Sorry for being so nitpicky, but it will have a great impact on mediocre smart phone devices, which don't have desktop like GPUs.
I got interested in webgl and tried mozilla webgl getting started guide. I was a little overwhelm how low-level webgl feels with shaders and such.
https://developer.mozilla.org/en-US/docs/Web/WebGL/Getting_started_with_WebGL
Would it be an idea to use three.js and then say that you have to include three.js if you want to use WebGL?
They seem to have conquered webgl pretty well :)
I think using three.js is a good idea. Its never a good move to reinvent the wheel.
Well, I've actually had almost everything we need working for about a month (taking existing programs and switching Canvas to WebGL seems to go ok). I just haven't had the time to tackle that last 10%... (If anyone wants to fork my repo, go for it! :) )
I think this is a case where a better fitting wheel is worth it. Though if someone wants to write a Crafty compatible wrapper around three.js, that would be cool even if we end up with our own webgl support -- what I've written is only for 2D stuff, same as our existing render components.
I'm definitely interested in this (WebGL support).
Hopefully I'll be able to get a PR for this ready next week, since I have some time off. :)
Ok, I grappled with what I'd written so far, and emerged with something that's a lot more coherent. The big remaining thing to implement is alpha transparency (and flipX/Y, I guess.) Then I'd like to land it in the develop branch, though it should probably still be considered experimental. :)
One thing I'm not sure about: where to keep the vertex/shader sourcecode -- with the relevant components, or in a separate file?
I don't know if this helps. I'm grappling with this myself.
You should follow the S in SOLID: Single Responsibility. Classes (or components, I suppose) should have a single responsibility, no more.
I personally like splitting one class per file, and Crafty games start to get hard to manage when you have 5-6 entities or components defined in a single file.
The webgl file is also almost 1000 lines of code, which is also a good indication that it should probably be split.
@starwed You have put a lot of effort in there, nice job.
Does your current implementation have a program switch for each entity (and thus each batch consists of one entity only)? I think your intention was to have a program switch if the vertex/fragment shader changes - see WebGL Draw call. But you create a program in each component's init method, that means every entity will have its own program, instead of a singleton component program. I suggest you make a single program per component (no matter how many entities have the component added to them) and then you make a program data instance for each entity, which holds the entity specific data (aPosition, aExtras, aColor, aTextureCoords, ...).
Would it be possible in the future to not write data for an entity to the GPU buffer, if the entity hasn't changed (as the data already resides on the GPU buffer)? In RenderBatch you could use GLContext.bufferSubData
, which only writes part of the buffer, leaving the rest intact.
Would it be an idea to use three.js and then say that you have to include three.js if you want to use WebGL
If Crafty is going to use an external WebGL renderer, I would suggest pixi.js instead, which is built for 2D WebGL rendering.
If we are going down this path then I additionally suggest adding bindings to other frameworks:
But you create a program in each component's init method, that means every entity will have its own program
Nah, the program is cached by the name passed in to establishShader. The call to initProgram will only create it if it doesn't already exist; the entities are just storing a reference. (There are a lot of method names that need to be updated to reflect what's going on now!)
Sprite using different spritesheets will have separate programs, but I didn't really see a good way round that.
Would it be possible in the future to not write data for an entity to the GPU buffer, if the entity hasn't changed (as the data already resides on the GPU buffer)?
Maybe! The way it works right now, data for each entity is written to a typed array, and then that array is copied to the GPU in one call. Only writing to the array when an entity changes is pretty easy, but we could probably optimize how much of the array is copied as well. However, there's a cost per copy, so we might only need to worry about cases where it's a clear win. e.g., I believe it would be faster to copy 5 entities in one call, than 3 entities in 3 calls.
Hmm, on second thought I'm not sure whether a particular buffer on the GPU will persist through other batches? Anyway, it's definitely something to investigate.
One thing I'm not sure about: where to keep the vertex/shader sourcecode -- with the relevant components, or in a separate file?
I ended up including it inline using the brfs
plugin, which browserify supports as a transform.
Regardless of whether it should or should not be kept with the components that use it, javascript simply doesn't support multiline strings in a nice way, which makes it a maintainability nightmare. Storing each shader in it's own file, and then inlining it, is much nicer! :D
Ok, the alpha property is now respected, but I haven't implemented the z-sorting yet. (This'll be kind of easy to do in a lazy way, but as mentioned above there are some optimizations we should really do here.)
Also, realized that viewport zoom isn't implemented yet, though that shouldn't be too bad.
Nah, the program is cached by the name passed in to establishShader. The call to initProgram will only create it if it doesn't already exist
Great!
Sprite using different spritesheets will have separate programs, but I didn't really see a good way round that.
It's not optimal, but it's fine. If you follow the guideline to have all sprites in one spritesheet and only use Sprite components (or with Color components in a different z
level) you will get the best possible performance.
I'm not sure whether a particular buffer on the GPU will persist through other batches
If you don't modify the buffer on the GPU, the data will remain in the buffer on the GPU. You have different buffers for different programs, so I don't see the problem.
but we could probably optimize how much of the array is copied as well
Yes, we could to that, but it won't be easy, I thought about having 2 different buffers/buffer regions (one for static entities - specifically marked by user- and one for dynamic entities). The problem is that static entities can also vanish from the visible viewport area and they wont be drawn anymore. That means the whole static buffer / buffer region has to be uploaded again.
Added simple, non-optimized z-sorting for proper transparency.
Remaining to implement: the flipX and flipY properties.
Realised after a couple of minutes that these were easy to do, and so implemented them. :)
I think the webgl branch now supports every feature that sprite and tint need.
:+1:
When do you think it's ready for testing?
I have a small game and wouldn't mind seeing what breaks.
I ended up including it inline using the brfs plugin, which browserify supports as a transform.
So in the production version of
Nvm, crafty.js
the browser has to load these seperate shader files or are they inside crafty.js
& crafty-min.js
?readFileSync
is a feature of node, so they have to be inside crafty.js
& crafty-min.js
:)
z-sorting:
texture lookup interpolation
in the future we should somehow allow the user to specifiy GL_NEAREST to use for texture lookups -> maybe add to Crafty.pixelart()
#666
documentation In the future when you have time, could you write up a high-level overview of the WebGL drawing process in the wiki perhaps? (with all the buffers, important OpenGL calls, timing of calls etc..)
I have some more thoughts on the manner:
z-sorting: What about removing the non-optimized z-sorting on the CPU completely and letting the GPU (Depth Test) take care of it?
Caching entity data on the GPU
Over time entities that do not change often will accumulate around the beginning of the array which will not be uploaded to the GPU again. Cost: a few additional comparisons, splices
and pushes
compared to the current implementation, if I am not missing anything.
What about removing the non-optimized z-sorting on the CPU completely and letting the GPU (Depth Test) take care of it?
The issue (that you acknowledge) is that you can no longer use partial transparency. The main goal here was to be a drop-in replacement for Canvas. I could definitely see adding an option to trade-off some features for even better performance, but for now let's concentrate on that goal.
Caching entity data on the GPU
I like the idea of trying to bunch 'static' entities at the beginning of the vertex buffer -- that makes a lot of sense. Not sure the best way to approach this, though.
The last thing I need to implement for "correctness" (that I know of!) is to handle the creation/destruction of entities; as is, it would run out of buffer space if you cycle through a lot of them. Seems like that and optimizing the position of objects in the buffer would be pretty highly related.
I like the idea of trying to bunch 'static' entities at the beginning of the vertex buffer -- that makes a lot of sense. Not sure the best way to approach this, though.
- We can always add that later (once this PR lands).
- I think even the proposed, naive algorithm should work wonders: if the set of entities doesnt change over 1 sec thats 1000ms/50msDelta = 20 full buffer copies saved.
- I have some more ideas on how to improve the algorithm:
- When inserting entities in the buffer for the first time OR If dirtyIndex points to first element in buffer (all elements have to be reinserted anyway)
- Insert the entities into the sorted buffer according to a heuristic, that guesses the "lifeTime" of the entitiy inside the buffer
- Entities that score "good" according to the heuristic will be placed at front, while entities that score "bad" will be placed at the back
- Possible heuristics that come to mind:
- distance to viewport.follow.target (the player) -> the smaller the distance, the better heuristic score
- prediction of player movement based on historical data -> if entity moved right in the past (e.g. side scroller), entities that are "in front" - right - of the player will have better score than entities that are "in the back" - left - of the player
- last time of change: how long ago did entitiy change any of its 2D properties? entities that are "static" - no recent changes - will have better score than entities that changed recently
- change ratio per frame: calculate the ratio between the amount of frames the entity changed and the amount of frames the 2d properties didnt change -> the smaller the ratio, the better the score (maybe add exponentially smoothed average of change ratios of the past)
- entities manually labelled "static" by user will have better score than other entities
The last thing I need to implement for "correctness" (that I know of!) is to handle the creation/destruction of entities; as is, it would run out of buffer space if you cycle through a lot of them
Could you elaborate, please? Isn't the buffer space adjusted to the length of the data you upload (and you overwrite the whole buffer every frame) ?
Could you elaborate, please? Isn't the buffer space adjusted to the length of the data you upload (and you overwrite the whole buffer every frame) ?
I believe the buffer space is, but the information has to be stored in a typed array before loading into the buffer. Previously each new entity just took up the next set of slots in that array... eventually it would run out of space.
The current implementation now tracks "holes" left in that typed array when you remove an entity, letting it reuse them when you add a new entity. I need to document the maximum number of live entities of one type, but it can actually be pretty large without using too much space. (A 1000 entity limit would mean it requires about 100kb of space.) Probably should add an option for manipulating this, just in case.
I started toying around with adding a WebGL backend on Monday, and it's actually come together pretty quickly. The idea here is to render sprites using a webgl context rather than the standard canvas. So it won't draw 3D models, it'll just do what Canvas does, faster. (Well, hopefully faster!)
I have the basic viewport and 2D capability working -- I can render colored squares using a WebGL component, rotate and translate them, and translate the viewport.
There's still a lot of work left to do, but I thought I should post here about it. The few next big things to add are
And I'm sure there's lots of room for optimization, since I'm a complete novice at webgl.
I'm working on the webgl branch of my repo. All the new code is in webgl.js. There's a lot of cruft right now, and I'm testing it with a "TestColor" component that lets you set the color with
color: function (r, g, b)
.