BrunoLevy / learn-fpga

Learning FPGA, yosys, nextpnr, and RISC-V
BSD 3-Clause "New" or "Revised" License
2.58k stars 246 forks source link

Implement OLED for Doom #106

Closed M4rkoHR closed 1 year ago

M4rkoHR commented 1 year ago

I added support for the SSD1331 OLED display on the Doom port (i_video_oled.c). It may need some tweaks from Fastdoom such as ability to lower resolution past 320x200 to run faster. I'm not very familiar with Makefile conditional compiling so I just hardcoded i_video_oled.c instead of the i_video_fb.c that used to be there.

BrunoLevy commented 1 year ago

Cool ! I'll merge it ASAP.

Note1: we can make it faster using several tricks:

Note2: maybe we can select OLED or FrameBuffer when starting Doom (this will require understanding how command-line arguments are handled in Doom sourcecode, but it seems doable)

Oh, and I only implemented sending data to the SPI screen one byte at a time, we can probably send four bytes in one go, I'll investigate that (need to better understand LiteX SPI interface).

M4rkoHR commented 1 year ago

This is just a proof of concept atm, I'd like to note that it's not resizing the pixel array, it's interpolating and picking the closest one to the scale for the display while skipping others in the screen buffer (nearest neighbor).

It's doing some simple FP arithmetic:

iy = (int) y*scale_y

and while I don't know how much these FP operations slow down the gameplay (1 per pixel + 1 per row), I think it's the 320x200 render resolution that's the main culprit.

I tested displaying the image without resizing (displaying the 96x64 portion of the top left corner) as well as skipping every third pixel (where you lose some pixels on the bottom and right) and it ran just as quick.

This is petitbateau on ULX3S 12F but it seemed to run just as quick (or slow, rather) as the example on Your twitter from January 2022.

Thanks for the quick answer, I'm glad to contribute 😃

BrunoLevy commented 1 year ago

Yes, you are right, time is dominated by all the rest, and it would run much smoother if we could decrease resolution, normally it is possible (I remember in the 90's I was doing that to play on my 33 MHz 486...). A possibility to make it approx 3x faster: generate LiteX with a VexRiscV processor (it has a pipeline, and it runs at slightly more than 1 cycle per instruction, whereas Petitbateau uses between 3 and 4 cycles per instruction). There is also TordBoyau (the pipelined version of femtorv) that may be even faster (but I need to find a way to interface it with LiteX's cache).

P.S. I have merged the pull request, and modified the Makefile in such a way that it generates both versions.

BrunoLevy commented 1 year ago

Just tested my faster OLED routine, it is significantly smoother (I'd say nearly 2x the FPS), it is nearly playable ! If we can wire the input routines to the buttons of the ULX3S, then we could play !

The optimized OLED routine gains a significant number of cycles per pixel:

M4rkoHR commented 1 year ago

The floating point division is performed only once per frame in this implementation (it could probably be computed only once in the Init, didn't think of that at the time), whereas floating point multiplication then cast to int is performed for every pixel (+ every row)... is multiplication faster than division in this example?

From quick googling it seems FP multiplication is closer to 6-7 cycles, still not as fast as integer multiplication but perhaps still irrelevant due to the inability to lower the resolution.

There is a possibility of manually computing the nearest pixel pairs and just creating a lookup table. This would use 6KB of memory but it should be as fast as scaling to 80x50 while maintaining full usage of the display.

Managing to lower (or at least change it to multiple of 96x64) the actual render resolution would be killing 2 birds with one stone (no scaling + faster render)

Regarding the buttons, could you point me in the direction of some LiteX or LiteOS code using the buttons so I can look into that.

BrunoLevy commented 1 year ago

Hi,

M4rkoHR commented 1 year ago

I got the Nearest Neighbor scaling working using a lookup table in #107