MEGA65 / mega65-core

MEGA65 FPGA core
Other
237 stars 84 forks source link

Enhanced DMA Byte Munging Options #795

Open gardners opened 3 months ago

gardners commented 3 months ago

There is value in being able to munge values being copied via DMA, e.g. a pipeline like the following:

ADD value then AND then EOR then OR

By default the ADD value would be $00 and AND would be $FF, and the other two being $00

bjotos commented 3 months ago

As discussed on Discord, a relevant feature would be support of 4-bit data (NCM mode).

  1. DMA shall be able to read data in 4-bit-per-pixel mode, supporting selection of NCM nibble order and 4bpp Sprite nibble order
  2. Transparency, if enabled, shall test for (nibble) value zero and stop pixel output before any other step (i.e. before adding a value)
  3. DMA shall allow to either expand data to 8bpp (NCM), allowing to ADD values like above. This is meant (to "colorize" using "16-color-chunks" of the palette, similar like NCM mode supporting 16 palettes, just with extra flexibility to allow non-16-entry aligned sub-palettes.
  4. or keep data as 4bpp (NCM) and write-out nibble-wise data (in NCM nibble order). This is meant to support "3D drawing" like OutRun, Lotus, Star Fox, Stunt Car Racer using up to 16 colors, while keeping enough RRB cycles for overlays and RRB Sprites to enhance the visuals.
  5. For constant source value mode, the two nibbles shall be used individually, still allowing Addition (to achieve values > 15 for FCM output) and NCM output, along with transparency check on nibble-level as described above. Use-case are "shadows" like done in Star Fox and other early 3D (simulated by alternating color and transparency laid out in checkerboard pattern) and dithered colors (expanding number of possibly shades)

We should use a new NCM_MODE (or 4BPP_MODE) with three bits for this, where one bit is 4BPP_ENABLE (enabling this mode), one bit is SPRITE_ORDER_ENABLE (enabling Sprite-nibble order for read), one bit is 4BPP_OUTPUT_ENABLE (enabling write to NCM-order nibbles) Default for this mode should be $00, i.e. disabled.

bjotos commented 3 months ago

Two more modes, which are POSSIBLY "out of scope" because it's not "feeling" like 8/16bit function:

  1. Lookup-Table (LUT) indirection using 256-byte table located in memory, allowing manipulation like DOOM does for lighting/darkening of sectors/based on distance.
  2. "2D Tile Wraparound" mode, basically allowing to specify NxM tile-size (both being powers of two, 8x8 being minimum), where source addressing would wrap-around when X / Y "value" would run over the edge. Since the DMA works w/o explicit X/Y coordinates, just a single address, this mode shall assume that a "2D tile" starts with least bits being zero (i.e. in 8x8 Bytes case, which is same as FCM or NCM char btw), each "2D tile" would start at address which is multiple of 64 Bytes (i.e. the last 6 bits can be masked, X-address is lower 3 bits, Y-address is upper 3 bits). Allowed range for each dimension would be 8-256, powers of two (6 combinations, i.e. 4bits per dimension). Adding a "disabled" value of zero could be default, still fit into 4bits per combination (and make wrapping only in one dimension possible, which could be useful). Idea is to be able to combine this with (source) Line-DMA for textured floors (like DOOM does, where texture is 64x64)