craftyjs / Crafty

JavaScript Game Engine
http://craftyjs.com
MIT License
3.41k stars 560 forks source link

WebGL support #687

Closed starwed closed 9 years ago

starwed commented 10 years ago

I started toying around with adding a WebGL backend on Monday, and it's actually come together pretty quickly. The idea here is to render sprites using a webgl context rather than the standard canvas. So it won't draw 3D models, it'll just do what Canvas does, faster. (Well, hopefully faster!)

I have the basic viewport and 2D capability working -- I can render colored squares using a WebGL component, rotate and translate them, and translate the viewport.

There's still a lot of work left to do, but I thought I should post here about it. The few next big things to add are

And I'm sure there's lots of room for optimization, since I'm a complete novice at webgl.

I'm working on the webgl branch of my repo. All the new code is in webgl.js. There's a lot of cruft right now, and I'm testing it with a "TestColor" component that lets you set the color with color: function (r, g, b).

kevzettler commented 10 years ago

Awesome!

mucaho commented 10 years ago

Nice idea. Let me share my thoughts on the current state of your webgl implementation:

starwed commented 10 years ago

I've looked at transparency, and we can just sort the list of entities before rendering. (Before I started this I hoped that wouldn't be necessary, but we already do something similar for canvas.) So that shouldn't be too bad, especially if we're smart about keeping a sorted array around.

I could tell while I was coding the current stuff that there must be a better way to push that data, but for now I'm going to do things the naive slow way. Once I've implemented texture code, I think I'll have a better understanding of our requirements. Then we can try to make it smarter! :)

One thing about webgl is that it only works in very modern browsers, so we can use some performance tricks like typed arrays that would normally be off limits. That also opens up using tools like emscripten or asm.js, if we need to do any expensive computations in js.

starwed commented 10 years ago

So what goes into the buffer? You could just pack all the uniforms of each entity into the buffer instead.

@mucaho If your'e talking about uniform buffers here, they're apparently not usable in WebGL, which is based on ES 2.0. :(

mucaho commented 10 years ago

If your'e talking about uniform buffers here

I meant vertex arrays/buffers.

Take your time to figure everything out, but in the end you will have to avoid uniforms:

"Uniforms are so named because they do not change from one execution of a shader program to the next within a particular rendering call. This makes them unlike shader stage inputs and outputs, which are often different for each invocation of a program stage." taken from OpenGL Documentation.

Shader stage inputs refer to vertex attributes in the case of vertex shaders. Vertex attributes is what OpenGL has in mind to pass on entity specific attributes.

starwed commented 10 years ago

Yeah, I'm slowly figuring this stuff out. :)

It looks to me like we'll want to provide some settings for the user to improve performance -- disabling partial transparency, for instance, could speed things up.

starwed commented 10 years ago

Got a working implementation of "Sprite".

starwed commented 10 years ago

I've been exploring the options for transparency. I think it would make sense to let the user toggle between three modes:

Possibly only modes 1 and 3 are worth supporting, since I'm not sure 2 would really be any better than 3.

I've started to think about how to optimize things -- mostly by batching draw events together. This is where using per-entity-uniforms fails, and you have to feed all the data in via vertex attributes. I think I see how to do this, but I haven't played with it yet at all!

mucaho commented 10 years ago

What do you mean by sorting?

starwed commented 10 years ago

What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws. (Though probably we can mitigate this quite a bit by maintaining an ordered list -- then we only need to sort when something changes z-value)

So if you didn't have any sprites with partial transparency, you might prefer that we turned on the depth buffer, discarded transparent pixels, and didn't sort the entities.

mucaho commented 10 years ago

Hmm can we cache the sorted data structure somehow? Like we have a sorted list, new entities have to be inserted and entities that change order (listen to z attribute change) have to be removed and reinserted. Would that speed up the performance instead of sorting the entities each time rendering is done?

On Tue, Jan 21, 2014 at 7:38 PM, starwed notifications@github.com wrote:

What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws.

— Reply to this email directly or view it on GitHubhttps://github.com/craftyjs/Crafty/issues/687#issuecomment-32914878 .

mucaho commented 10 years ago

Oh, that's exactly what you wrote in one of your posts before :)

On Tue, Jan 21, 2014 at 10:46 PM, Matija Kucko mkucko@gmail.com wrote:

Hmm can we cache the sorted data structure somehow? Like we have a sorted list, new entities have to be inserted and entities that change order (listen to z attribute change) have to be removed and reinserted. Would that speed up the performance instead of sorting the entities each time rendering is done?

On Tue, Jan 21, 2014 at 7:38 PM, starwed notifications@github.com wrote:

What I meant was that to support transparency, we'll have to sort the entities before rendering them. I think this actually had a noticeable cost when doing full-screen canvas redraws.

— Reply to this email directly or view it on GitHubhttps://github.com/craftyjs/Crafty/issues/687#issuecomment-32914878 .

starwed commented 10 years ago

I have it drawing colored squares in batches now. Yay? :) (The architecture will currently process objects in _z order, and render a batch when it encounters an entity that requires a different shader program. For sprites, that means everything with the same sprite sheet will be rendered in one big batch.)

One issue is that there's a lot more code involved with implementing a component for WebGL than for the canvas or DOM. (Mostly in setting up the shader program and related methods, and then in writing to arrays/buffers.) And the old tactic of just special casing the draw function will get pretty unwieldy.

Might make more sense to implement renderer-specific logic as separate components. (TintGL, SpriteGL, etc.) The regular Color/Sprite components could just add the appropriate one depending on what component you choose.

starwed commented 10 years ago

Got the batching working with sprites.

In a benchmark which just draws ~100 sprites bouncing around the screen, the WebGL backend seems about twice as fast as the canvas. (And there are definitely some optimizations I can do there.) So that's hopeful!

Also, it looks like it works just fine as a drop in replacement for Canvas. So it'll hopefully be easy to switch between them based on browser support.

mucaho commented 10 years ago

Congratulations. We can increase performance further.

Do we really need seperate shader programs? Consider the following possibility:

mucaho commented 10 years ago

Shader programs with branching code is a bit harder to write if you do not have branching control structures (if .. else ..) support in the shader language. Do WebGL shaders support if & else? If not, I could show you a mathematical pattern to emulate branching.

starwed commented 10 years ago

Do we really need seperate shader programs?

I've read that having conditional branching in shaders is not necessarily a good idea, because it can slow down the execution. I don't have much experience here, of course.

It's actually a pretty likely scenario that every asset will use the same program -- everything will be sprites using the same texture. If color entities are used for bullets or particles, they'll probably all have the same z level in most games. We'll certainly put that in the documentation that I am dreading having to write. :)

There's another reason why I don't want to have one monolithic program -- it makes it harder to extend the framework. Right now there's nothing stopping someone from writing their own custom programs, which I quite like conceptually.

starwed commented 10 years ago

(Oops, hit the wrong button!)

Isn't there an issue with rendering multiple textures in one pass? So you'd have to do some context switching regardless.

mucaho commented 10 years ago

I've read that having conditional branching in shaders is not necessarily a good idea, because it can slow down the execution. I don't have much experience here, of course.

Yeah exactly, that's why I suggested the "mathematical branching" approach, as the executions times are exactly the same for every shader run (which is very beneficial for pipelining thousands of shader executions, like you read correctly).
On the other hand I am sure there is an even easier way to combine the functionality without branching much.

I don't want to have one monolithic program -- it makes it harder to extend the framework

In such a case, the shader would be set to the user specified shader and changed back after user is done with his rendering (so 2 shader switches). Having only one program on our part saves us even more unnecessary shader switches for the standard, framework rendering.
More notably, custom shaders come often with custom vertex attributes, so we will have to adapt for that in the future.

Isn't there an issue with rendering multiple textures in one pass? So you'd have to do some context switching regardless.

I didn't even think about passing multiple textures. It's about removing the color program / texture program switches.

mucaho commented 10 years ago

It's actually a pretty likely scenario that every asset will use the same program -- everything will be sprites using the same texture. If color entities are used for bullets or particles, they'll probably all have the same z level in most games.

That's actually a very valid point, but if we can remove the "chance" entirely and make it better in reasonable time without reasonable downsides, why not? :)

mucaho commented 10 years ago

Unify both Sprite, Color and Tint rendering into one shader program

After sleeping over it: I was wrong, it would actually have a more negative performance, as the color-rendered entities would do unnecessary texture lookups. However Tinting and Spriting are almost the same, you just pass on an additional color vertex attribute and multiply the texture color with that varying color attribute. For drawing sprites without tinting the color attribute is neutral, thus vec4(1.0, 1.0, 1.0, 1.0). It's also fine to have seperate programs for tinting and spriting: How many users will use tinting? If you don't need tinting then sending a vec4 to the gpu is a waste of bandwith.
Sorry for being so nitpicky, but it will have a great impact on mediocre smart phone devices, which don't have desktop like GPUs.

kevinsimper commented 10 years ago

I got interested in webgl and tried mozilla webgl getting started guide. I was a little overwhelm how low-level webgl feels with shaders and such.

https://developer.mozilla.org/en-US/docs/Web/WebGL/Getting_started_with_WebGL

Would it be an idea to use three.js and then say that you have to include three.js if you want to use WebGL?

They seem to have conquered webgl pretty well :)

kevzettler commented 10 years ago

I think using three.js is a good idea. Its never a good move to reinvent the wheel.

starwed commented 10 years ago

Well, I've actually had almost everything we need working for about a month (taking existing programs and switching Canvas to WebGL seems to go ok). I just haven't had the time to tackle that last 10%... (If anyone wants to fork my repo, go for it! :) )

I think this is a case where a better fitting wheel is worth it. Though if someone wants to write a Crafty compatible wrapper around three.js, that would be cool even if we end up with our own webgl support -- what I've written is only for 2D stuff, same as our existing render components.

ashes999 commented 10 years ago

I'm definitely interested in this (WebGL support).

starwed commented 10 years ago

Hopefully I'll be able to get a PR for this ready next week, since I have some time off. :)

starwed commented 10 years ago

Ok, I grappled with what I'd written so far, and emerged with something that's a lot more coherent. The big remaining thing to implement is alpha transparency (and flipX/Y, I guess.) Then I'd like to land it in the develop branch, though it should probably still be considered experimental. :)

starwed commented 10 years ago

One thing I'm not sure about: where to keep the vertex/shader sourcecode -- with the relevant components, or in a separate file?

ashes999 commented 10 years ago

I don't know if this helps. I'm grappling with this myself.

You should follow the S in SOLID: Single Responsibility. Classes (or components, I suppose) should have a single responsibility, no more.

I personally like splitting one class per file, and Crafty games start to get hard to manage when you have 5-6 entities or components defined in a single file.

The webgl file is also almost 1000 lines of code, which is also a good indication that it should probably be split.

mucaho commented 10 years ago

@starwed You have put a lot of effort in there, nice job.

Does your current implementation have a program switch for each entity (and thus each batch consists of one entity only)? I think your intention was to have a program switch if the vertex/fragment shader changes - see WebGL Draw call. But you create a program in each component's init method, that means every entity will have its own program, instead of a singleton component program. I suggest you make a single program per component (no matter how many entities have the component added to them) and then you make a program data instance for each entity, which holds the entity specific data (aPosition, aExtras, aColor, aTextureCoords, ...).

Would it be possible in the future to not write data for an entity to the GPU buffer, if the entity hasn't changed (as the data already resides on the GPU buffer)? In RenderBatch you could use GLContext.bufferSubData, which only writes part of the buffer, leaving the rest intact.

mucaho commented 10 years ago

Would it be an idea to use three.js and then say that you have to include three.js if you want to use WebGL

If Crafty is going to use an external WebGL renderer, I would suggest pixi.js instead, which is built for 2D WebGL rendering.

If we are going down this path then I additionally suggest adding bindings to other frameworks:

starwed commented 10 years ago

But you create a program in each component's init method, that means every entity will have its own program

Nah, the program is cached by the name passed in to establishShader. The call to initProgram will only create it if it doesn't already exist; the entities are just storing a reference. (There are a lot of method names that need to be updated to reflect what's going on now!)

Sprite using different spritesheets will have separate programs, but I didn't really see a good way round that.

Would it be possible in the future to not write data for an entity to the GPU buffer, if the entity hasn't changed (as the data already resides on the GPU buffer)?

Maybe! The way it works right now, data for each entity is written to a typed array, and then that array is copied to the GPU in one call. Only writing to the array when an entity changes is pretty easy, but we could probably optimize how much of the array is copied as well. However, there's a cost per copy, so we might only need to worry about cases where it's a clear win. e.g., I believe it would be faster to copy 5 entities in one call, than 3 entities in 3 calls.

starwed commented 10 years ago

Hmm, on second thought I'm not sure whether a particular buffer on the GPU will persist through other batches? Anyway, it's definitely something to investigate.

starwed commented 10 years ago

One thing I'm not sure about: where to keep the vertex/shader sourcecode -- with the relevant components, or in a separate file?

I ended up including it inline using the brfs plugin, which browserify supports as a transform.

Regardless of whether it should or should not be kept with the components that use it, javascript simply doesn't support multiline strings in a nice way, which makes it a maintainability nightmare. Storing each shader in it's own file, and then inlining it, is much nicer! :D

starwed commented 10 years ago

Ok, the alpha property is now respected, but I haven't implemented the z-sorting yet. (This'll be kind of easy to do in a lazy way, but as mentioned above there are some optimizations we should really do here.)

Also, realized that viewport zoom isn't implemented yet, though that shouldn't be too bad.

mucaho commented 10 years ago

Nah, the program is cached by the name passed in to establishShader. The call to initProgram will only create it if it doesn't already exist

Great!

Sprite using different spritesheets will have separate programs, but I didn't really see a good way round that.

It's not optimal, but it's fine. If you follow the guideline to have all sprites in one spritesheet and only use Sprite components (or with Color components in a different z level) you will get the best possible performance.

I'm not sure whether a particular buffer on the GPU will persist through other batches

If you don't modify the buffer on the GPU, the data will remain in the buffer on the GPU. You have different buffers for different programs, so I don't see the problem.

but we could probably optimize how much of the array is copied as well

Yes, we could to that, but it won't be easy, I thought about having 2 different buffers/buffer regions (one for static entities - specifically marked by user- and one for dynamic entities). The problem is that static entities can also vanish from the visible viewport area and they wont be drawn anymore. That means the whole static buffer / buffer region has to be uploaded again.

starwed commented 10 years ago

Added simple, non-optimized z-sorting for proper transparency.

Remaining to implement: the flipX and flipY properties.

starwed commented 10 years ago

Realised after a couple of minutes that these were easy to do, and so implemented them. :)

I think the webgl branch now supports every feature that sprite and tint need.

kev-omniata-com commented 10 years ago

:+1:

ashes999 commented 10 years ago

When do you think it's ready for testing?

I have a small game and wouldn't mind seeing what breaks.

mucaho commented 10 years ago

I ended up including it inline using the brfs plugin, which browserify supports as a transform.

So in the production version of crafty.js the browser has to load these seperate shader files or are they inside crafty.js & crafty-min.js? Nvm, readFileSync is a feature of node, so they have to be inside crafty.js & crafty-min.js :)

mucaho commented 10 years ago

z-sorting:

texture lookup interpolation in the future we should somehow allow the user to specifiy GL_NEAREST to use for texture lookups -> maybe add to Crafty.pixelart() #666

documentation In the future when you have time, could you write up a high-level overview of the WebGL drawing process in the wiki perhaps? (with all the buffers, important OpenGL calls, timing of calls etc..)

mucaho commented 10 years ago

I have some more thoughts on the manner:

z-sorting: What about removing the non-optimized z-sorting on the CPU completely and letting the GPU (Depth Test) take care of it?

Caching entity data on the GPU

Over time entities that do not change often will accumulate around the beginning of the array which will not be uploaded to the GPU again. Cost: a few additional comparisons, splices and pushes compared to the current implementation, if I am not missing anything.

starwed commented 10 years ago

What about removing the non-optimized z-sorting on the CPU completely and letting the GPU (Depth Test) take care of it?

The issue (that you acknowledge) is that you can no longer use partial transparency. The main goal here was to be a drop-in replacement for Canvas. I could definitely see adding an option to trade-off some features for even better performance, but for now let's concentrate on that goal.

starwed commented 10 years ago

Caching entity data on the GPU

I like the idea of trying to bunch 'static' entities at the beginning of the vertex buffer -- that makes a lot of sense. Not sure the best way to approach this, though.

The last thing I need to implement for "correctness" (that I know of!) is to handle the creation/destruction of entities; as is, it would run out of buffer space if you cycle through a lot of them. Seems like that and optimizing the position of objects in the buffer would be pretty highly related.

mucaho commented 10 years ago

I like the idea of trying to bunch 'static' entities at the beginning of the vertex buffer -- that makes a lot of sense. Not sure the best way to approach this, though.

  • We can always add that later (once this PR lands).
  • I think even the proposed, naive algorithm should work wonders: if the set of entities doesnt change over 1 sec thats 1000ms/50msDelta = 20 full buffer copies saved.
  • I have some more ideas on how to improve the algorithm:
  • When inserting entities in the buffer for the first time OR If dirtyIndex points to first element in buffer (all elements have to be reinserted anyway)
  • Insert the entities into the sorted buffer according to a heuristic, that guesses the "lifeTime" of the entitiy inside the buffer
  • Entities that score "good" according to the heuristic will be placed at front, while entities that score "bad" will be placed at the back
  • Possible heuristics that come to mind:
  • distance to viewport.follow.target (the player) -> the smaller the distance, the better heuristic score
  • prediction of player movement based on historical data -> if entity moved right in the past (e.g. side scroller), entities that are "in front" - right - of the player will have better score than entities that are "in the back" - left - of the player
  • last time of change: how long ago did entitiy change any of its 2D properties? entities that are "static" - no recent changes - will have better score than entities that changed recently
  • change ratio per frame: calculate the ratio between the amount of frames the entity changed and the amount of frames the 2d properties didnt change -> the smaller the ratio, the better the score (maybe add exponentially smoothed average of change ratios of the past)
  • entities manually labelled "static" by user will have better score than other entities
mucaho commented 10 years ago

The last thing I need to implement for "correctness" (that I know of!) is to handle the creation/destruction of entities; as is, it would run out of buffer space if you cycle through a lot of them

Could you elaborate, please? Isn't the buffer space adjusted to the length of the data you upload (and you overwrite the whole buffer every frame) ?

starwed commented 10 years ago

Could you elaborate, please? Isn't the buffer space adjusted to the length of the data you upload (and you overwrite the whole buffer every frame) ?

I believe the buffer space is, but the information has to be stored in a typed array before loading into the buffer. Previously each new entity just took up the next set of slots in that array... eventually it would run out of space.

The current implementation now tracks "holes" left in that typed array when you remove an entity, letting it reuse them when you add a new entity. I need to document the maximum number of live entities of one type, but it can actually be pretty large without using too much space. (A 1000 entity limit would mean it requires about 100kb of space.) Probably should add an option for manipulating this, just in case.