MightyPirates / OpenComputers

Home of the OpenComputers mod for Minecraft.
https://oc.cil.li
Other
1.6k stars 434 forks source link

Video RAM for faster drawing #779

Closed Inari-Whitebear closed 4 years ago

Inari-Whitebear commented 9 years ago

(might be loosely related to the suggestion about GPU shaders) So my issue is that I like drawing pretty pictures, but that takes forever with the current draw calls. Even small pictures take a little longer than they really should. But well we can't really speed that up much without causing lots of traffic etc.

An idea I had was to allow copying from one screen/gpu to another using the copy command, so one could pre-draw stuff on one screen, then just copy it to the other. Which wouldn't make a lot of sense technically, but would work.

A better idea would be to give GPUs video ram, which is limited per GPU (probably in bytes) and can be allocated and destroyed. To not break compatibility, per default the GPU allocates some when the resolution is set. The GPU also sets which of the allocated memory's is output to the screen (which per default is the one created when the resolution is set/the screen is bound). And copying between them is instant, but drawing on each still takes the normal time. This way you could pre-draw pictures as "textures" on allocated memory and then just blit it to the screen. And the traffic also shouldn't get bigger since its only a single call that has to be sent (the memory should be kept in sync or such maybe just synced the first time its used in blitting).

This also makes more sense as afaik IRL pushing data from CPU to the GPU is slower than using the GPU's memory.

The defaulting is mostly just to keep compatibility.

So like: gpu.freeMemory() -- returns amount of free memory in bytes gpu.totalMemory() -- returns amount of total memory in bytes

gpu.allocate(width, height) -- allocates some memory to be used with the other functions, fails if there isn't enough free memory, returns true, memory_address on success and false, error on error gpu.free(memory_address) -- frees the specified address. returns true on success and false, error on failure. if no address is given, all memory is freed gpu.list() -- lists the alllocated memory_addresses

gpu.getResolution(memory_address) -- returns width, height of the resolution of the memory_address gpu.bind(screen_address, memory_address) -- sets the screen output to the specified memory address (maybe possible to have multiple outputs for higher tier GPU?) gpu.copyMemory(source_memory_address, x_pos, y_pos, width, height, target_memory_address, target_x_pos, target_y_pos) -- copies part of a memory to another gpu.setMemory(memory_address, x, y, char) -- sets the x/y position of the memory_address to the given char (using the currently set background and foreground colors)

This would allow for some interesting usage such as making 2 memories of the screen's size, drawing the updates on one and then copying it over to the output-bound memory to show a new something on the screen in one go instead of drawing it over time. Also allows to pre-draw images that can then be just copied over onto the screen and such~

fnuecke commented 9 years ago

From my understanding this would be a more elaborate and flexible version of #457? In which case I must say I'm still more inclined to that, since it'd be simpler to implement and use, while lacking only a subset of this functionality (specifically: creating an arbitrary number of buffers).

asiekierka commented 9 years ago

VRAM is a million times better, as you can then preload all the necessary video content.

Also, since we're here already - could you add a config multiplier for the amount of draw operations? I fear bandwidth not.

Inari-Whitebear commented 9 years ago

Well the idea mainly came from that I like drawing pictures, but the redraw speed of them is unacceptable for actual use. So I thought about having a memory on the GPU that can be drawn to and then just copied over to the output-memory. And it only made sense to be able to tell which of the memory is used as output to the screen hehe.

Double-Buffering allows to draw in the background and then flip, which is nice to allow for drawing something and then making it appear. However in my case that doesn't really solve the issue of image drawing itself being too slow.

Thats why you'd slowly draw it into the GPU memory (preloading it so to say, as asie put nicely) and then just copy between the memories instantly, allowing you to e.g. draw a 10x10 picture into one memory and then instantly copying it over to the memory that is output to the screen, to whereever you want on that. Since the VRAM is either synced on first usage or synced constantly, the client only needs to be sent a single command (I suppose similar to how copy already works) That means once you slowly have preloaded/predrawn stuff you can have rather instnat drawing of those preloaded things

An interesting addition may also be the ability to only bind a certain area of a VRAM to the screen. If you wanted to you could probably have the graphics card have a VRAM resolution thats quite big (for high tier cards anyway) and then just bind a certain part of it to the screen, allowing the other parts to be used for preloading textures that are then just copied onto the part that bound to the screen in order to become actually visible.

(sorry, im bad at explaining)

Eunomiac commented 9 years ago

I'd definitely love something along these lines as well, and I can think of three other implementations that would really benefit from the ability to copy regions to and from an "off-screen" buffer (or similar functionality, as you've discussed).

Admittedly, I suspect the second and third would cost a lot of resources in other areas, but the first one is all upside:

magik6k commented 9 years ago

This seems to be something between #457 and #601. It IS way better than double buffering, and implementation of it partially does what shaders were intended to do.

Implementation-wise I'd give gpu's a surface that is 2 times larger than their maximum 'output' resolution, leaving screen 'bound' in this surface at 1,1. this way there wouldn't be many changes to current API(and possibly no breaking changes). This would allow off-screen drawing(everything drawn would be transported to client side), and then fast gpu.copy to bring drawn thing to the screen in one operation.

Making it this way would still allow creation of shaders for 'realtime' drawing of things.

Eunomiac commented 9 years ago

Just a minor suggestion, if it's possible without too much extra work: defining the offscreen space as a negative reflection of the onscreen coordinates might make things easier to grok, not to mention making the buffer zone more resilient if a player changes screen resolution in the middle of a program.

E.g., on a 60-width screen, I'm thinking of an offscreen region that's defined from x = -1 to x = -60, instead of x = 61 to x = 120. This way, the absolute values of the coordinates could be maintained, and moving from onscreen to offscreen would be as easy as flipping the sign.

(Of course, if the current API doesn't contemplate negative coordinates, or if reflection is more of a PITA than translation, it's probably not worth the effort)

jpastuszek commented 9 years ago

Perhaps something like Nintendo PPU implementation: http://en.m.wikibooks.org/wiki/Super_NES_Programming/Graphics_tutorial would allow for interactive graphics with little actual data to transfer.

ds84182 commented 9 years ago

Ok, so I made basically some bullet points for consideration when I start implementing this:

You can give more ideas and attempt to fix my current ideas and stuff, I'll start making API details in a second.

skyem123 commented 9 years ago

That seems to have a lot of detail and limitation (even by OpenComputers standards)...

fnuecke commented 9 years ago

Sounds reasonable, few questions:

Vexatos commented 9 years ago

@fnuecke I would imagine VRAM being literally a custom RAM stick doing all this.

fnuecke commented 9 years ago

But VRAM belongs on the GPU. Having it in a RAM slot doesn't make a lot of sense, really? I think having the amount of VRAM coupled to the GPU tier would be the most logical approach, no?

Additional concern wrt. layered rendering: this means the text resolution has to be taken into account when projecting the free-style textures. Which is... a bit of a problem, actually. Because currently, the server has no clue whatsoever of how large the font used to render the text actually is. Clients could use different fonts, with different sizes, and things would still work out all-right. Having something with a fixed size, where the size is defined on the server obviously interferes with that. I'm really not sure how to best go about this? Uncoupling the two and just assuming the default size (8x16 I think?) would "work", but then text positioning would be completely off on clients using custom fonts, of course. Replacing the font renderer with the new system, providing some kind of translation layer of string->custom textures, using the Unifont on the server-side comes to mind, but I feel that'd be... overkill. Maybe in the long run?

Inari-Whitebear commented 9 years ago

Yeah VRAM in a PC is usually on the GPU... I think consoles follow the idea of having just dedicated VRAM on the board.

skyem123 commented 9 years ago

My suggestion for a graphics system would be this:

elfifae commented 9 years ago

Thinking back to Skye's implementation, I realised something -- what if the graphics layer was simply on and behind the text layer the whole time, but initialised to black? Without having anything called to update, it would provide the greatest level of compatibility with existing code -- the only thing I can think of as needing updating would be updating term.clear() to wipe the graphics layer and its backbuffer. If text got in the way of something, the relevant area could probably simply be cleared to fully expose the graphics beneath.

Also, if this separate pixel drawing layer was pursued, besides some basic primitives like Skye suggested, I don't think we would necessarily need to have anything too fancy. Manual pixel manipulation, perhaps a raw drawing method that will take a bounding box and a binary string of poalette indexes, plus a copy method with an optional transparency index. We're looking at computers that would be close to the spec of the PC-8801 or ZX Spectrum here, not a gaming console that would warrant fancier sprite processing mechanics.

Regarding potential font/graphics size mixups, I think a simple baseline assumption (possibly half the size of the default font for reduced memory/bandwidth footprint) would be best, coupled with clientside stretching if it's not actually the case.

There's my long-stewing two pence.

TheRealOrangus commented 9 years ago

Such things as: gpu.flush() and deletion of direct to screen drawing. gpu.newCanvas(widht, height, kind, bits) - kind is either "foreground" or "background", also as the only possible way to change resolution - create a new canvas, that would be realistic. gpu.bindCanvas(id) - only 1 canvas of each kind is bound. gpu.getBoundCanvas(kind) - returns id of bound canvas of the specified kind. gpu.unbindCanvas(id) - unbinds a canvas. gpu.deleteCanvas(id) - deletes a canvas, should be unbound before deleted, an error raised otherwise.

skyem123 commented 9 years ago

@TheRealOrangus, can you explain your suggestion in more detail?

TheRealOrangus commented 9 years ago

Remove direct to screen rendering completely, and instead make something like canvases, each canvas would contain foreground xor background and could be bound to the screen, canvases could be created and deleted, a bound canvas couldn't be deleted, until it's detached. Method gpu.flush() would be used to apply changes on the canvas, to the screen. Remove gpu.setResolution() and make so that creating a new canvas with different resolution be the only way to change it. Background and foreground canvases could have different color depth and be bound to the screen together, at the same time.

skyem123 commented 9 years ago

That sounds like a very big and incompatible change

TheRealOrangus commented 9 years ago

But quite realistic x).

asiekierka commented 9 years ago

Not realistic at all. Not for a text mode machine...

TheRealOrangus commented 9 years ago

Just as APUs are :D.

skyem123 commented 9 years ago

eh... APUs are useful.

elfifae commented 9 years ago

APUs are pretty much a few steps down from an SoC, and you'd better believe there are plenty of those in use on embedded devices that expose only a textmode interface.

Prototik commented 8 years ago

Any news for that? Drawing still slow without it :-/

asiekierka commented 8 years ago

It's fast enough for me... Just mind what calls have what limits

Inari-Whitebear commented 8 years ago

Well we got the part where we can have a bigger drawing area than the viewport is, but I don't think the resolution available actually increased :f

skyem123 commented 8 years ago

It would be neat if the entire graphics system was redesigned. :P

SaphireLattice commented 8 years ago

Maybe we can fit some custom glyphs in there as well?

skyem123 commented 8 years ago

It would be neat if VRAM could be used to store everything in a graphics card.

SaphireLattice commented 8 years ago

Well... That's what Video RAM is for.

BigTwisty commented 8 years ago

First off I am quickly falling in love with this mod. It is probably the coolest Minecraft mod I've ever seen, and I've played a lot.

The OP's VRAM idea seems like the ideal implementation to me. Adding any more functionality to the VRAM interface than what was specified would be playing the game for us, rather than challenging us to come up with our own ideas.

That being said, the ability to reconfigure the GPU to free up VRAM would be helpful. Selecting a lower resolution, switching to 8 bit ASCII mode at the cost of the extended character set; these seem to fit well within the general scope of the OpenComputers. Adding a bunch of useless Sprite management crap however wouldn't. If someone wants sprites, VRAM management would provide all the tools necessary to implement them.

I love that for the most part OC gives me all the freedom to do whatever I want with it within the scope of the Minecraft server/client limitations. Giving us control over the VRAM seems to be the next natural progression of this. PRETTY PLEASE? :)

gjgfuj commented 8 years ago

It being vram, the intention is to draw pixels to the vram then copy them to the screen right? So you make an area in the vram for sprite1 then draw it in a new place when you want it to move?

On Wed, 15 Jun 2016, 5:36 AM Big Twisty notifications@github.com wrote:

First off I am quickly falling in love with this mod. It is probably the coolest Minecraft mod I've ever seen, and I've played a lot.

The OP's VRAM idea seems like the ideal implementation to me. Adding any more functionality to the VRAM interface than what was specified would be playing the game for us, rather than challenging us to come up with our own ideas.

That being said, the ability to reconfigure the GPU to free up VRAM would be helpful. Selecting a lower resolution, switching to 8 bit ASCII mode at the cost of the extended character set; these seem to fit well within the general scope of the OpenComputers. Adding a bunch of useless Sprite management crap however wouldn't. If someone wants sprites, VRAM management would provide all the tools necessary to implement them.

I love that for the most part OC gives me all the freedom to do whatever I want with it within the scope of the Minecraft server/client limitations. Giving us control over the VRAM seems to be the next natural progression of this. PRETTY PLEASE? :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MightyPirates/OpenComputers/issues/779#issuecomment-225992131, or mute the thread https://github.com/notifications/unsubscribe/AC1W3PggPbbPg9KLKMjBye5cgVlpz9b8ks5qLwK_gaJpZM4DN66R .

BigTwisty commented 8 years ago

Whether they be pixels if direct pixel editing is enabled or characters, that would be the idea. Personally I feel pixel editing would be memory intensive and may keep large modpack makers away.

A typical 2d sprite-based engine stores all the animation frames in one big buffer image. Then at render time, each sprite instance knows which frame is current and copies only the corresponding rectangle of the sprite buffer to the screen at it's its location.

But we're getting off topic. This is just one of many applications possible if we could print to any allocated VRAM co-located on the Minecraft server and client with access to a high speed copy command.

gpu.allocate( width, height ): number

gpu.release( address ): boolean

gpu.availableMem(): number

gpu.copy( x, y, width, height, tx, ty, [source], [destination] ): boolean

Where source and destination default to the screen buffer address for backwards compatibility.

Techokami commented 8 years ago

A built-in sprite management system would actually be a bit more in-line with older microcomputers, which had hardware-based sprite functions. This is due to direct drawing of pixels being VERY slow and costly (in terms of needed RAM), whereas a reconfigurable character set, plus hardware sprites (which are just characters drawn in arbitrary locations on the screen after regular text is drawn) are a lot faster and cheaper. Sure it isn't as super powerful as using a modern GPU, or as super flexible with drawing images, but it would be simple enough for newcomers to learn how to use, and would be less taxing on network performance for Minecraft servers.

That be my $0.02 on the subject

malucard commented 7 years ago

I don't like the idea of sprites. I'd rather do it myself. If you really want sprites, what about different types of GPUs? The classic text-based one, a pixel-based one and a sprite-based one.

@Techokami, that's not entirely true: a 320x200 framebuffer for a tier 3 screen can be just 64k (down from normally 512k) long if "compressed" correctly: colors are indexed by 24-bit RGB values, but a tier 3 screen has only 8 bits of color depth (the 256-256-256 palette gets truncated to a 6-8-5 one), so it's possible to store 8 pixels in a single 64-bit integer (Lua's default type), and that's considering you don't actually compress it.

Edit: additionally, 32k for 4-bit depth, 16k for 2-bit depth, 8k for 1-bit depth, and infk for 0-bit depth (???).

NickNackGus commented 7 years ago

As for how to implement graphics memory, I propose giving each tier GPU a fixed VRAM resolution, that uses almost exactly the same API as is currently used. Calling set(), clear(), or copy() within the coordinates of the visible screen works exactly as before. Calling them outside of the coordinates of the visible screen, but still within VRAM, will function as if they were called directly on a screen, but not be visible on the screen. They will be stored for later use, and may be copied onto the screen at a later time.

Additionally, I would recommend three new commands, [getVirtualResolution(): number, number], [setOffset(x: number, y: number): boolean], and [getOffset(): number, number]. Calling setOffset() changes which part of VRAM is visible to the screen. With slight modification to edit.lua, this could mean that scrolling would only require a call of getOffset(), change the x and y values as appropriate, setOffset(), and fixing the status bar area. Additionally, if the displayed resolution is changed to be at most half the vertical or horizontal resolution of VRAM, double-buffering can be implemented to change the entire screen's contents at once, or triple-buffering with one third the resolution, etc. I do not expect any official API for double buffering directly, but it would easily become possible.

I am in favor of sprites for higher resolution graphics without using text-mode hacks. If OpenComputers were comparable to an 8801 or ZX Spectrum, then both support high resolution graphics that are not based on text; rather, directly setting pixels from different palates. Meanwhile, the idea of sprites is to create small images that can be placed nearly anywhere while using less memory. While 256 8x8 256-color sprites would take up 128kB, and a background sprite map an additional 1kB, this is not how sprites typically work.

Sprites normally have a much smaller, local palette, such as 1-bit (think PacMan), or 2-bit (think NES), where one color is transparent, and the other colors are mapped to a larger palate (1 byte for every other color option). In that system, using the 2-bit approach, 256 sprites would be 4kB, the background map (full 320x200 pixel background with no overlapping sprites) would be 1kB, the color information could either be attached to the sprite sheet (0.75kB for palate mapping), or to each instance of a sprite (3kB for palate mapping, 8kB total).

The latter is very similar to the NES, and as some have discovered some sprites are reused with different colors (clouds/hills, mario/luigi/fire versions of the same characters). I may have spent too much time looking at memory maps and hardware information for old computer systems. I also find The 8-Bit Guy on YouTube does an excellent job summarizing how these systems work for those who do not wish to know precisely which memory locations and commands do what.

Perhaps sprites should be moved to its own issue, however?

malucard commented 7 years ago

If the API is the same, then you won't actually add anything. OpenComputers (obviously) already has it.

NickNackGus commented 7 years ago

Almost exactly the same. In the second paragraph I explain three new commands to move the displayed area around a fixed amount of video memory per GPU, and store extra graphics data in the existing format but out of sight.

malucard commented 7 years ago

I just realized we don't have to wait. Let's add it ourselves!

... as an addon. Let's make more complex GPUs!

Inari-Whitebear commented 7 years ago

"Open for adoption" means someone else should make and PR it, yes :P

Noone has yet though it seems.

malucard commented 7 years ago

Yeah, that's my intention. I meant making a separate thing, adding much more complex GPUs, which is why an addon.

skyem123 commented 7 years ago

How about a GPU that modifies output from the current text mode GPUs, similar to how the early 3D graphics cards worked?

LoganDark commented 6 years ago

https://github.com/IgorTimofeev/DoubleBuffering

Inari-Whitebear commented 6 years ago

While that library not bad it's not great for actual use either. At least for games or the like. Guess it's okay-ish for interface stuff.

LoganDark commented 4 years ago

So there's no cost for writing into the buffer, but according to the commit 15d34a8 description (I may be wrong) there is still a cost for copying to the visible portion of the screen, which makes it pretty much useless since there is no difference between blitting directly to the screen and blitting from offscreen onto the screen. Definitely not what this issue was proposing, please reopen or correct me if I'm wrong. cc @payonel

payonel commented 4 years ago

there is no way to blit directly to the screen, you can only copy from off screen. now, it is free to prepare the offscreen, before it was not. the reason we will not make it faster to blit to the visible screen is for server respect

LoganDark commented 4 years ago

there is no way to blit directly to the screen, you can only copy from off screen.

Sorry, haven't used OC in a while

the reason we will not make it faster to blit to the visible screen is for server respect

Config option?

payonel commented 4 years ago

it already is a config option, you can remove all the cost with config options already

LoganDark commented 4 years ago

it already is a config option, you can remove all the cost with config options already

You could add this and then make it a config option to disable the faster blitting. what would the performance impact be? I'm curious.

Also, I'm interested to hear what use cases your changes actually have as-is. If copying from the buffer isn't any faster than drawing directly to the screen, then what's the point of writing off screen anyway?

My point is: Are you sure this feature is useful in its current state? And are you sure it resolves this issue?