GPU Shaders - Githubissues

magik6k commented 9 years ago

So someone on the IRC has asked about function to put whole buffer on screen at once. It is bad idea like most of other ideas that would allow achieving "high" frame rate in OC apps.

The idea

I think it would be nice(and epic) to allow users to create very small programs that would be run on client side and invoked by server side programs that run on gpu.

Initial thoughts:

"Shader" programs should be small(I think of limit of 1-2kb)
Shader would be run on all clients' computers and on server.
Shader should have very little deadline
Shader should run only when it's invoked from its 'origin program', not per frame/tick
Shader would receive its data through 'uniforms' - variables sent from program to shader
Shader would 'run' on GPU. GPU should have some RAM for both programs and it's data(I think of 16k/32k/64k tiers)
Shader do not persist

Implementation ideas:

Shaders are written in very stripped version of Lua.
- There should be no coroutines
- There should be access to gpu through global gpu table
- Uniforms should be visible as global variables or inside global uniform table
GPU can hold few shader programs to save bandwidth and time needed to compile programs.
GPU should have kernel that manages program loading, passing uniforms and running

Pros:

Very little bandwidth usage:
- The only data that is sent to client would be:
- Shader code, probably binary form (once, when shader is loaded)
- Uniforms (per shader run, depends on user code)
- Run requests (per shader run)
- 'Manual' gpu operations
- Maybe screen CRC or something like that, something may sometimes fail
Ability to create high frame-rate apps using little bandwidth
Game engines?

Cons:

User must run code
GPU-Screen updates would have to be rally fast to avoid generating client-side lag spikes
Loading chunk with computer using shaders on it would be IMHO kinda pain
Little code example:

Program:

local shaderCode = shaderFile:read()
local shader = component.gpu.newProgram(shaderCode)
shader:setUniforms({x = 4, y = 8})
shader:run()

Shader:

gpu.setForeground(0xaabbcc)
for x = uniform.x, uniform.x + 9 do
    for y = uniform.y, uniform.x + 9 do
        gpu.set(x,y,"X")
    end
end

This very basic program draws 10x10 square filled with X characters with some color(I know that there is fill function, but this is just an example. Imagine e.g. sprite manager based on it.. or even advanced window manager).

I feel that the idea is worth discussion. I was thinking to start implementation of it myself, but I'm currently creating 2 libraries, starting small addon mod and playing Minecraft(by playing minecraft I mean creating OS in OC)

Kubuxu commented 9 years ago

Loosing persistence should be a "feature" in this case as for example glContext, in real world, is lost when you switch app, on phone, or hibernate on normal computer. You just would need to be ready to reinitialize your context.

fnuecke commented 9 years ago

Some random thoughts:

Shader programs would run in a highly restricted sandbox. Pretty much no standard Lua library stuff, as you said. Probably just a few string ops (sub, char, byte) and most of math.
Because creating a full machine for that would be overkill, and handling the differences manually would be annoying, would probably always just use LuaJ... ~~does LuaJ have debug hooks? For limiting the number of operations.~~ Derp, of course it does, that's how OC limits execution time after all.
Alternatively, who's up to hacking together a primitive code-interpreter doing C-like syntax? Or know of one? Or maybe even go (pseudo-)assembly? :D
Uniforms would obviously have to be serialized in some way to be transmitted to clients. I'm thinking it'd be acceptable to keep it simple here, and only allow a flat table with string keys and primitive type values (boolean, number, string).
If shaders are forced to be state-free (e.g. give it a fresh env each time or give it a read-only env, i.e. only allow declaring local vars) persistence shouldn't be that bad. It'd also avoid filling up the client's memory somewhat.
How is math.random implemented in LuaJ? Would seeding once be enough to guarantee the same results on server and client? Or just don't provide math.random?
Would it be nicer to instead of providing GPU callbacks to the shader, have it be more like a real fragment shader: call it for each "pixel" to be rendered? Invoking the shader would provide the area to apply it to.
There's no component network on the client, so which screen a GPU writes to would have to synchronized to the clients, manually (GPU functionality is purely server-side right now).

Overall I think it's a very interesting approach to the problem, but we'll have to be very careful as to how this might affect client performance.

Wuerfel21 commented 9 years ago

assembly! probably a stripped 65816?

magik6k commented 9 years ago

I was thinking about making small language for this purpose, but I thought it was little overkill. I still think that implementation of some small shader assembly language(which could be then compiled to safe java bytecode :D), and then there would be library in OpenOS that would allow compilation of some higher-level language to the gpu-assembly.. and this appears to be the most complex, but yet IMHO the best(memory management, speed, general control) way of doing this stuff.
I don't think that making it in way real fragment shader works would save performance or bandwidth. But I like the idea of limiting surface shader has access to

Kilobyte22 commented 9 years ago

the issue with java bytecode: you always need a full class. And you cannot reliably unload a class once its been loaded. Using that an attacker could let the server run out of permgen quite easily. If anything compile to lua bytecode and let it run in luaj on the client

Edit: It could even happen in regular environment without malicious background - which would lead to very odd and hard to reproduce memory leaks.

asiekierka commented 9 years ago

@fnuecke - Have you considered http://fscript.sourceforge.net/ ? If not, I might be up to doing something like porting the picoc C interpreter.

fnuecke commented 9 years ago

Hmm, FScript looks nice and simple, but I don't immediately see a way to limit execution "steps"? I think a way to limit the number of consecutive instructions is essential here (as could be done with the count hook in Lua), to avoid blocking / tick lag.

SoniEx2 commented 9 years ago

Have you considered brainfuck?

Kubuxu commented 9 years ago

Talking about how should it work:

You pass an array to it.
Shader is executed on each element of the array.
Shader has access to functions drawing primitives like: Line, Square, Rectangle, ?Circle?, Point, Text.

It would work more as geometry shader in OpenGL than fragment shader; allowing to save transfer.

Giving shader access to time stamp in mili/nanoseconds would allow to create fluent animations(IMHO even 10fps is fluent).

@fnuecke FScript is so small that adding limit shouldn't be a hassle.

fnuecke commented 9 years ago

The FScript codebase is small, yes, but adding statefulness to what's essentially a parser could still be quite a bit of effort. I'm not sure that'd be worth it. It also would mean OC would have to ship yet another non-standard library, which is a bit of a minus. A minimal Lua env sounds better to me, tbh.

As recap, here's what I'd currently suggest, opinion subject to change:

setShader(s:string), getShader():string, setUniforms(t:table), getUniforms():table.
setData(t:table), getData(t:table), which is sortakinda like uniforms, but allows more data but is much slower. Consider this the (very very very rough) equivalent of VBOs in OpenGL.
Call the shader automatically at a fixed rate, increasing rate with increasing GPU tier. Possibly allow lowering rate via a callback? Unsure.
Uniforms table must only contain primitives as keys and values. Other entries are ignored.
Some system generated "uniforms" such as time. In ticks or real time? Would lean towards ticks. Others? Made available via env.
Shader script and uniforms must not exceed a certain size. Increases with GPU tier. Uniforms measured based on serialized data, shader on string length.
Shader script would be Lua.
- Has no persistent state, i.e. gets fresh env each call/does not allow setting globals. Makes saving and synching a non-issue.
- Has only access to select GPU callbacks (set, fill, setColor, *not setResolution, setColorDepth) and subsets of a few Lua standard libs, math, table, string, base.
- Has very limited instruction count before it is interrupted.
- Shader will automatically be cleared on error, signal/event with error message will be dispatched.

Am I missing something?

Kubuxu commented 9 years ago

What about allowing shaders to run over a series of data. Then you could write engine creating descriptions of objects and run it through them. This would allow easier rendering of unspecified number of objects like bullets or creatures. If we are limiting our selves to only primitive->primitive this approach is hardly possible. It would make OC's shaders more similar to those in RL as you have uniforms and data on which you work.

Other question is maybe in higher tier of GPU give shader access to secondary image buffer for z-test or stencil.

SoniEx2 commented 9 years ago

I want a BFSL (brainfuck shader language) (it's just about a couple hundred lines)

fnuecke commented 9 years ago

Can you give me a more concrete example of what you mean by "series of data"? Do you mean texture storage? That could make sense, I guess.

As for depth buffer and such... that would require a concept of depth, first. Which currently doesn't exist. And introducing that just for shaders... I'll need some convincing this will see enough use to justify the changes/overhead :P When you do have depth sensitive rendering, it would probably be feasible to sort them in advance, in the limited context that is OC?

Kubuxu commented 9 years ago

As you made point of only primitive to primitive uniform table, it is not possible to send arbitrary number of object. So either we make primitive to table mapping possible or what is more interesting we make shaders work in real world. Normal shader is run multiple times with same uniforms but with different data as input. This is series of data.

Additional buffer (z-buffer) would be to control what should rendered over what. You would like to render character over a background. Z-test is available in css to control what's on top.

Pwootage commented 9 years ago

I would recommend having a buffer in Lua that's swapped - presumably all implemented inside the kernel. This would mean the only scala part you have to write is replacing the entire screen buffer with the new one, meaning there should be essentially no jumping between scala and lua, which should be a lot faster.

fnuecke commented 9 years ago

Updated the summary above based on discussion on IRC with:

setData(t:table), getData(t:table), which is sortakinda like uniforms, but allows more data but is much slower. Consider this the (very very very rough) equivalent of VBOs in OpenGL.

As for buffer in Lua + swapping. Buffer may be table of string (for multibyte chars) or ints. Possibly array of array (speed concerns by @Pwootage, anyone care to benchmark?). Alternatively possibly thin userdata proxy for real buffer? Again, benchmarking would be required to see how expensive the call forwarding in LuaJ would be.

fnuecke commented 9 years ago

Going to close this as sort of a reverse-duplicate of #779, as that was one of the suggestions that seemed to get the most approval, and seemed most feasible. Further discussion about this topic, if desired, should take place in that issue.

MightyPirates / OpenComputers

GPU Shaders #601

The idea

Little code example: