commanderx16 / x16-rom

Other
153 stars 44 forks source link

Fast VLOAD #337

Closed ZeroByteOrg closed 2 years ago

ZeroByteOrg commented 2 years ago

This speeds up VLOAD by enabling block reads directly into VRAM in the LOAD routine using MACPTR. This is over ten times faster than the previous byte-by-byte loading that was done.

This requires a potentially-breaking change for software that uses MACPTR directly. MACPTR now uses the C flag to select between normal loading behavior, and a new copy loop which does not increment the memory destination pointer as data is read from the buffer.

I have implemented the new behavior by creating a new copy loop in fat32_read. The new loop is required, since the previous copy loop uses count-down-to-zero behavior on the loop index, which would reverse the order of bytes being sent into VRAM. Modifying the existing loop would have also required a check on each iteration whether to increment the dst counter, which would be slower, so I opted for a purpose-built copy loop instead.

Kernal's own calls to MACPTR are patched to work with the new API functionality on MACPTR, but end-user code that calls MACPTR will need to ensure the carry flag is clear prior to calling MACPTR or else it will cause data loading to behave incorrectly.

ZeroByteOrg commented 2 years ago

Accompanying emulator patch is PR#432

mist64 commented 2 years ago

Once this is merged:

mist64 commented 2 years ago

I cleaned up the label magic in fce973e79f0930db770ef96cf74ff14061737f1e.