bryanperris / cor64

N64 Emulator written in C#
MIT License
48 stars 1 forks source link

RDP Pipeline and RDRAM #1

Open bryanperris opened 5 years ago

bryanperris commented 5 years ago

To create a discussion here.

Ref 1: https://github.com/project64/project64/issues/574

bryanperris commented 5 years ago

Poke @cxd4

bryanperris commented 5 years ago

When the RSP/RDP are executing a draw list, is the RDP actively rasterizing portions of the main scene directly into the framebuffer? I have expected that each game setups the pipeline state, render pixels, then finally flips buffers, the final picture is in the RDRAM framebuffer.

cxd4 commented 5 years ago

It's as you say. There is no temporary or midway storage or anything like that; while the RDP triangle, rectangle or other drawing operations are in session, bits are written into RDRAM. This ordinarily would cause flickering and/or missing triangles if the frame buffer was constantly DAC'd by the VI during and before completion; that's basically what the double buffering as you mention is for.

bryanperris commented 5 years ago

So Rdram is used as the backbuffer and the CPU can peek at it at any time. I wonder if with Vulkan the pipeline and cpu can work the same way. Otherwise you need good software rendering. I remember how fast the nocash ds emulator did software 3d and it could even let you debug it with a fancy interface, it seems the rdp needs the same thing here. I know how ugly the resolution will be but I rather experience the games in the way they were made for.

cxd4 commented 5 years ago

You may or may not necessarily need software rendering. I'd say it is possible for a cycle-accurate implementation of the RDP to just draw triangles within OpenGL or whatever and update the system-shared frame buffer memory in software with a memcpy() in-between triangle commands, though of course you would be copying a bunch of bytes in software which would not be very much faster than software rendering. Or you would just be iterating directly from glReadPixels(), which is even worse.

I heard a rumor years ago that someone wrote a very efficient 3-D software rendering graphics plugin for the PS1. Exophase or something like that. I personally have little interest in Vulkan, but it's better than Khronos continuing to rip apart the OpenGL specs with deprecation any further I guess.

bryanperris commented 5 years ago

With the approach of copying the backbuffer after each OpenGL call, I wonder how faster that would be to do that on Intel HD Graphics or ARM where the GPU is located on the same die as the CPU. Other idea I had is having some kind of USB 3 (or even PCI-E) based 'RDP device' like a FPGA or a construct that provides the basic needs of the RDP. I think the hardware costs of that would be cheap, and if you want to full experience of the N64 on your computer, it would be worth to buy. Still I think having the low-level RDP executing in parallel on the main processor is enough for full graphics without bottlenecking anything, there are so many limits to the N64 that even the real RDP slows down with hi-res rendering like with Perfect Dark, and so many games render a lot less than that game, I can't see why the RDP FPS would be bad unless it was not properly optimized. Also with the RDP, I bet a lot could be cached in memory which is an advantage over the real RDP that has a 4KB texture cache. I also wonder if OpenCL could be used to handle most of the computations anyways and don't forget there is AVX2 and such.

It also might be worth to look at Mesa's llvmpipe where LLVM runtime is used to optimize the software render path, and it was proven it had increase the performance of it compared to optimizations from a static compiler.

bryanperris commented 5 years ago

More notes on things I learned about the RDP.

bryanperris commented 5 years ago

Since the RSP is based on tasks, where each task loads in program code, then interprets the input data, I wonder if RSP GBI tasks can be compiled into vulkan kernels since it should let you take advantage of GPU computing power for both vector and scalar operations and also rendering graphics.

bryanperris commented 5 years ago

AMD APUs (with GCN) look promising for CFB: https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/HSA-enabled_integrated_graphics.svg/320px-HSA-enabled_integrated_graphics.svg.png

CPU/GPU memory is unified