Kroc / elite-harmless

Disassembly (CA65) of the Commodore 64 port of the seminal space-sim Elite, by Ian Bell / David Braben.
https://discord.gg/ZYnQr5S
Other
93 stars 12 forks source link

Remove Flickering #1

Open Kroc opened 6 years ago

Kroc commented 6 years ago

A faster, better Elite is of no use without removing the flickering.

Having two bitmap pages is beyond possibility with a stock C64, so some kind of very clever engineering will be needed. One possibility is using the text screen with a 256 x 144 custom char-set display. This would need to use interrupts to change char-set pages every 7 rows.

Kroc commented 5 years ago

Note to self: I've found the line-drawing routine, so we could replace this with something much faster, like Oxyron's Bresenham routine sampled here: http://www.codebase64.org/doku.php?id=base:lines assumedly used in some form in this release https://csdb.dk/release/?id=126151 which has some stunningly fast line-drawing!

Kroc commented 5 years ago

Note to self: Possible build option to utilize an REU's DMA channel to speed up transfers. We could use this to blank the entire bitmap screen faster than erasing all existing lines.

Kroc commented 5 years ago

An on-going concern for absolute line-drawing speed is the amount of parameter validation that occurs every time a line is drawn. What I would like to do is completely remove parameter validation from the line-drawing routine and push that out to the points where lines are built. If a line doesn't change every frame, then time is saved. Depending on the nature of the routines building lines, some of the validation may be entirely unnecessary (e.g. building sun lines, they are always left-to-right)

Kroc commented 5 years ago

It has occurred to me that the actual line drawing only cares about an initial point (X/Y) and a width & height -- not the end co-ordinates. Therefore it is pretty inefficient that the game is building X2/Y2 points, only for these to be discarded. If we build lines based on a starting point and length we can probably save a lot of cycles wasted in administration

dyme6510 commented 5 years ago

Double buffering imposes a penalty of up to 8000 cycles when the new screen is finished at the beginning of the bitmap rasterlines, because you'll have to wait until after the bitmap to swap bitmaps and begin rendering the next frame. On average there's about 1700 cycles lost. However, this loss can be reduced by racing the raster beam when clearing the new bitmap, just before actually swapping the displayed bitmaps.

To achieve double buffering, many drawing routines would have to take in account that xor-ing is no longer done, but instead each frame had to be cleared and painted from scratch. For example, text on the viewport would have to be redrawn every frame.

mrdudz commented 5 years ago

i'd still either do the xoring (requires some sort of buffering for which lines were drawn) or add some other mechanism to speed up the clearing (for example, buffer the addresses of the bytes that were written to, and only clear those) - because clearing 8000 bytes each frame is really a huge speed penalty.

Kroc commented 5 years ago

It's not 8'000 bytes, because the viewport is only 18 rows, not 25, it turns out to be 4'608 bytes. Even erasing that many bytes in a loop would still be logically faster than going through all the math and management redrawing all the lines on the screen, just to erase them! Not only that, but having to XOR the lines to both draw and undraw them places a penalty on all line-drawing which also makes a straight 4.5K erase more efficient in light of a potential lower per-pixel draw cost.

Personally, I think a combined char-bitmap and dirty-char approach will work best. Erasing can be handled by using a single blank char across the charmap screen stripe and then a dirty cache can be used to clear each char before its written into its fixed location in the charmap-stripe.

dyme6510 commented 5 years ago

Clearing the whole viewport would cost about 25k cycles. The cobra on the intro screen costs between 20k and 40k for each xor pass. Although that is still optimizeable when switching to charmode, it is still much better worst case to clear the whole screen. You probably could also do some rough bounding box check to clear much fewer char cells in the other cases.

mrdudz commented 5 years ago

one good way is to not actually clear the bitmap, but use the videoram instead (set both colors to black). then first time you draw to a "character" you clear it first and restore videoram

Kroc commented 5 years ago

That can't be done buffered though unless you take a copy of the color RAM for the second buffer(?) A dirty-tile cache might take an awful lot of RAM though (720 bytes x 2 = 1'440 bytes) unless a 1-bit approach is taken (1'440 bits = 180 bytes)

mrdudz commented 5 years ago

color ram? do you want to make it use more than 2 colors?

Kroc commented 5 years ago

Oh silly me, I read you as saying colour RAM ($D800), not the screen RAM ($400 each), my bad!

mrdudz commented 5 years ago

with some clever arranging, you can also put both bitmaps into one videobank (with a rastersplit for the bottom half) so you only need one vram and then can maybe afford an unrolled clear routine for it.

dyme6510 commented 5 years ago

I think a used char block map would slow down the line drawing pixel loop quite a bit, that's why I would start with a char-grid aligned bounding box with easy min-max checking per line for each object.

Regarding the character mode, two options come to mind. The simplest one would be horizontal ascending 8 lines 2 complete and one quarter charmaps. The position on the screen is so easily calculated we can even dump the screen-offset tables. While using only 4.5k per screen, thus easily fitting everything in one vic-bank and using barely more memory than the current approach, it leaves no characters for the border, which would have to be done with sprites. These cost 7 cycles per raster line (4 when the target reticule sprite is active anyway), so roughly 800 cycles. Since memory is scarce at the moment, this would be my first approach to look into how to change the graphic calls.

The other possible mode would be to take only 7 lines of each charset, but make them columns-first ascending maps, so that changing y during the inner pixel loop seldom overflows, increasing the x-block is a multiplication by 56. Since changing the y-cell is still only affecing the hi-byte (changing the complete charset), the y-part of the innerpixel loop becomes simpler AND is less frequently used, although we need screenposition tables again. The drawback is it's using more memory, and all the border-tile chars have to be present in both charsets. But the last charset uses 4 lines and still some for the border graphics, so it cannot be shared between the two buffers. On the upside, we only need one text screen for both the flight graphics buffers.

Kroc commented 5 years ago

I'll add a build option to remove the sound engine & music, so that'll give a few KB free for any play-around.

Kroc commented 5 years ago

OK, done, I've applied a no-sound option to the fastlines build, that gives you 6K more to work with.

I'm working on the overall project architecture, solving some issues related to using VIC-bank 3, with the aim to produce a cartridge build that should solve space problems for good.

Kroc commented 4 years ago

Something that's actually quite obvious when pointed out is that the sun, despite possibly filling the screen, doesn't flicker! This is because the sun is erased and redrawn scanline by scanline for which a buffer of scanline widths is maintained (I've been documenting the circle-drawing methodology recently)

I'm not yet sure if the lines for all other objects are erased and redrawn one line at a time, or by whole objects (ships, space-stations, etc.). We may be able to reduce flickering by re-ordering how lines are erased/redrawn; redrawing lines one at a time, instead of erasing an entire object in one go, may drastically reduce flicker (if this is not the case already?). This would require a buffer of all lines (to be drawn) for each frame so that we can erase each line, without having to do any calculation.

Kroc commented 2 years ago

With great thanks to Mark Moxon's BBC disassembly and C64 flicker-free patch, as well as help implementing the code into elite-harmless, line flicker has been massively reduced! It still affects circles, which are drawn using a different process, but that potentially can be improved too. I may close this issue and open another regarding circle optimising / flicker-fixing