deltabeard / Peanut-GB

A Game Boy (DMG) emulator single header library written in C99. Performance is prioritised over accuracy.
https://projects.deltabeard.com/peanutgb/
295 stars 40 forks source link

gb: add savestate support #96

Open chestwood96 opened 10 months ago

chestwood96 commented 10 months ago

How wrong is it to just store a copy of the gb struct (together with the corresponding ram) to do save-states, experimentally it seems to work quite well but I may be missing something.

On load I basically copy the ram and object back and re-reference the functions (and make sure it's the right rom).

I thought this would be harder so I kind of feel I am missing something important.

deltabeard commented 10 months ago

Storing a copy of gb struct in that way might work for you, but there are multiple issues that will stop it from working:

To resolve these issues, it would be appropriate to add a savestate API that would allow a frontend application to create and apply savestates to Peanut-GB. Would you like something like that?

chestwood96 commented 10 months ago

Thanks for answering so quickly.

My use case here is a somewhat modified version of pico-gb where the save states are mostly so I can turn off the device and turn it back on and be right where I was so all those issues are relatively acceptable (I am not intending to share states between devices or platforms)

Point 3 is why I was so surprised it worked so well.

Having a dedicated api for savestates would be quite nice and what I expected I'd need to implement when I started this project but it would likely be a lot of effort and also quite a bit slower/higher overhead than the current solution.

deltabeard commented 10 months ago

So that works on the RP2040 microcontroller because the function pointers in the gb struct will be pointing to the correct functions in memory. But if you update the firmware or make any change to the gb struct, then the save state will no longer work. I think with some simple checks, it will be possible to allow save states to work despite changes in the location of various functions and changes to the placements of variables in the gb struct.

I can't imagine that the overhead will be much higher than the current solution, especially if we don't care about cross-platform compatibility. Do you have a time limit for how long it should take for a new save state to be recorded?

chestwood96 commented 10 months ago

I am actually redoing the function pointers on load so that isn't a problem (or at least I think that was what you meant, the read ram function and all that right?)

Since I found even my current solution takes too long for seamless background auto-saves (a 170ms stutter will definitely be noticable) there isn't a big time limit there.

Current speeds I get is 170ms save and 50ms load with sd and 730ms save and 2ms load for flash (and I have to turn off the second core while saving so sd is probably better). I may look into psram or something for a potentially faster and definitely more durable solution but so far it seems to work relatively well.

chestwood96 commented 10 months ago

On further thought, externalizing VRAM and WRAM (so you can pass them to gb as a pointer) might help too, especially with the cgb mode (which has at least booted the few roms I tested, still need to implement something to convert the colors for the lcd). It would allow you to have smaller savestates for games running in gb only mode since you'd only need to copy parts of those in that case.

deltabeard commented 10 months ago

I am actually redoing the function pointers on load so that isn't a problem (or at least I think that was what you meant, the read ram function and all that right?)

That's good at least.

Since I found even my current solution takes too long for seamless background auto-saves (a 170ms stutter will definitely be noticable) there isn't a big time limit there.

Current speeds I get is 170ms save and 50ms load with sd and 730ms save and 2ms load for flash (and I have to turn off the second core while saving so sd is probably better). I may look into psram or something for a potentially faster and definitely more durable solution but so far it seems to work relatively well.

The save states are being created continuously, could that not wear down the SD card quickly? Anyway, a potential solution would be to DMA to create a copy of the gb struct. The RP2040 DMA can copy 4 bytes in one clock cycle I think, so copying the gb struct should be very fast. The second core could then handle the copying of that shadow gb struct to the SD card. If the SD card copying is taking a long time, I wonder if DMA could also be utilised to perform that copy?

Another option would be to copy small chunks of the save state to the SD card using the second CPU core to allow the LCD and the audio time to process, rather than copying the full save state to the SD card in one go.

Otherwise, it might be appropriate to incorporate some power management into your system to allow the RP2040 to continue working for a short while after the user powers it off.

deltabeard commented 10 months ago

I think it would be nice to have a save state function in Peanut-GB, keeping in mind Peanut-GB's target of being fast. So I will change this issue to track that feature.

chestwood96 commented 10 months ago

The save states are being created continuously, could that not wear down the SD card quickly?

That is something I am very aware of but it is a sacrifice I am mostly willing to make and thanks to the small size of the saves the wear can be spread somewhat over the card. I was looking into f-ram and battery backed sram and stuff as alternatives but what I found so far was low capacity and extremely expensive, sd-cards are high capacity, cheap and easily replacable. The spi flash on the rp2040 also has quite a lot of endurance but writing to it is too slow and requires turning off the second core.

Anyway, a potential solution would be to DMA to create a copy of the gb struct. The RP2040 DMA can copy 4 bytes in one clock cycle I think, so copying the gb struct should be very fast. The second core could then handle the copying of that shadow gb struct to the SD card. If the SD card copying is taking a long time, I wonder if DMA could also be utilised to perform that copy?

I initially went down that line of thinking too but there just isn't enough ram to keep a second copy of the gb object (not to mention the ram array but that I could probably deal with by just blocking writes to it while the save is writing), especially not with the bigger gbc buffers. I don't even have enough ran to support cartridges with 128kb but from my research there were very few games actually using that.

Another option would be to copy small chunks of the save state to the SD card using the second CPU core to allow the LCD and the audio time to process, rather than copying the full save state to the SD card in one go.

Chunked background copy was my initial plan but that does require having a copy in ram so that's out (was planning to make the actual pixel data writing non blocking and do that in the meantime).

Otherwise, it might be appropriate to incorporate some power management into your system to allow the RP2040 to continue working for a short while after the user powers it off.

Current plan is soft-power off so the mcu can write a final savestate and then actually power off, that part should be relatively easy (famous last words with how well my plans have been working out in this little project so far).

Auto-save wise I'll probably just make them configurable, in some games having a 170 (or maybe 4-500 with the gbc buffers) stutter every couple minutes isn't a problem while in others it very much is.

chestwood96 commented 10 months ago

I did a bit of worst case testing, pokemon crystal jp(one of the only 64k ram roms I found and I don't have enough ram for the 128k ones which are even rarer) and running the gbc version of peanut-gb (ended up using a lookup table for the colors, flash is one of the few things I have plenty of) got me 330ms to lave and 80ms to load,

I was able to further reduce that when I found out how to turn up the spi speeds (250ms save, 60ms load)