Closed M4rkoHR closed 1 year ago
Cool ! I'll merge it ASAP.
Note1: we can make it faster using several tricks:
Note2: maybe we can select OLED or FrameBuffer when starting Doom (this will require understanding how command-line arguments are handled in Doom sourcecode, but it seems doable)
Oh, and I only implemented sending data to the SPI screen one byte at a time, we can probably send four bytes in one go, I'll investigate that (need to better understand LiteX SPI interface).
This is just a proof of concept atm, I'd like to note that it's not resizing the pixel array, it's interpolating and picking the closest one to the scale for the display while skipping others in the screen buffer (nearest neighbor).
It's doing some simple FP arithmetic:
iy = (int) y*scale_y
and while I don't know how much these FP operations slow down the gameplay (1 per pixel + 1 per row), I think it's the 320x200 render resolution that's the main culprit.
I tested displaying the image without resizing (displaying the 96x64 portion of the top left corner) as well as skipping every third pixel (where you lose some pixels on the bottom and right) and it ran just as quick.
This is petitbateau on ULX3S 12F but it seemed to run just as quick (or slow, rather) as the example on Your twitter from January 2022.
Thanks for the quick answer, I'm glad to contribute 😃
Yes, you are right, time is dominated by all the rest, and it would run much smoother if we could decrease resolution, normally it is possible (I remember in the 90's I was doing that to play on my 33 MHz 486...). A possibility to make it approx 3x faster: generate LiteX with a VexRiscV processor (it has a pipeline, and it runs at slightly more than 1 cycle per instruction, whereas Petitbateau uses between 3 and 4 cycles per instruction). There is also TordBoyau (the pipelined version of femtorv) that may be even faster (but I need to find a way to interface it with LiteX's cache).
P.S. I have merged the pull request, and modified the Makefile in such a way that it generates both versions.
Just tested my faster OLED routine, it is significantly smoother (I'd say nearly 2x the FPS), it is nearly playable ! If we can wire the input routines to the buttons of the ULX3S, then we could play !
The optimized OLED routine gains a significant number of cycles per pixel:
The floating point division is performed only once per frame in this implementation (it could probably be computed only once in the Init, didn't think of that at the time), whereas floating point multiplication then cast to int is performed for every pixel (+ every row)... is multiplication faster than division in this example?
From quick googling it seems FP multiplication is closer to 6-7 cycles, still not as fast as integer multiplication but perhaps still irrelevant due to the inability to lower the resolution.
There is a possibility of manually computing the nearest pixel pairs and just creating a lookup table. This would use 6KB of memory but it should be as fast as scaling to 80x50 while maintaining full usage of the display.
Managing to lower (or at least change it to multiple of 96x64) the actual render resolution would be killing 2 birds with one stone (no scaling + faster render)
Regarding the buttons, could you point me in the direction of some LiteX or LiteOS code using the buttons so I can look into that.
Hi,
We will manage to lower resolution, I discussed with @sylefeb who knows the sourcecode of Doom very well and he will help us (it is a bit tricky though, because the GUI part on the lower part of the screen has some hardwired resolution in it, so we will need to tinker a little bit).
For the buttons, it will be quite easy. We just need to write a little bit of Amaranth code to create the CSR in such a way that LiteX automatically generates the code, I'll work on that ASAP and tell you.
About scaling, I'd really avoid using any FP operation, because we may want to use a core that does not have FP (then FP mul is going to cost several hundred cycles !). It is perfectly doable, either using a table as you said, or using a Bresenham-like algorithm (I have it somewhere, just need to dig my archives).
We will need also a sound driver ! (doom without sound is a bit frustrating). I'll also ask @sylefeb who did something on the ULX3S.
I got the Nearest Neighbor scaling working using a lookup table in #107
I added support for the SSD1331 OLED display on the Doom port (i_video_oled.c). It may need some tweaks from Fastdoom such as ability to lower resolution past 320x200 to run faster. I'm not very familiar with Makefile conditional compiling so I just hardcoded i_video_oled.c instead of the i_video_fb.c that used to be there.