ITotalJustice / notorious_beeg

gba emulator written in c++23
https://notorious-beeg.netlify.app/
GNU General Public License v3.0
41 stars 4 forks source link

[GBA] waitloop detection #103

Closed ITotalJustice closed 2 years ago

ITotalJustice commented 2 years ago

intro

its pretty trivial to detect waitloops. what's slightly more difficult to detect is what the loop is exactly doing.

often, a loop will read values from ram and do a compare then jump back if it doesn't get the result it wants, like:

while (ram[addr] != 159);

detecting a loop

everytime a conditional branch that jumps backwards happens, keep track of the new pc. if another conditional branch happens, check if the new_pc == previous_jump_pc. if true, we have found a loop!


detecting read only loop

detecting a read only loop is more involved. you can use elimination by the loop size. if (current_pc - new_pc == 0x8) then only 3 instructions executed. you know the last instruction is the conditional branch, so the first instruction has to be a load and second instruction is going to be a compare.

of course the loop can be much bigger in size. the bigger the loop, the more instructions to analyse, which is more expensive. i have a cutoff off 6 instructions, which is the biggest loops ive found so far. its important that when you come to a conclusion as to whether of not the loop can be skipped, save this result. this makes your loop analyse function not really matter that much in terms of performance (within reason, dont be silly about it).

i also only check for loops if the executing code is within rom. i have not yet encountered a rom that wait loops from ram (in thumb), and i couldnt consider it safe to do otherwise. if wait looping in ram, the game dma the ram and overwrite it or that code can be replaced later on.

i have only checked for loops when a cond branch happens in thumb. i have not bothered with arm, and havent needed to.


ways of changing data in a read only loop

when in a read only loop (doesn't write), the only way the data can change is via:

  1. irq handler
  2. dma
  3. io register changes (if polling io reg)
  4. open bus (if polling open bus 😄 )

that's it. knowing this, some safe and big speed-ups can be made.


io waitloops

common io registers to poll are:

  1. vcount
  2. dispstat (check the mode)
  3. dma control registers (check if dma has finished)

therefore, reading any of these registers should store the pc. this will make waitloop detection (and its type) far simpler.


other waitloops

however, many games prefer to have their vblank irq handler set a variable in ram. these games will then poll that variable in ram. pokemon emerald does exactly this, so does firered, leafgreen, kirby, etc...

this is slightly more involved however can easily be checked, just ensure that the loop is read only. as long as the loop is read only, then we can safely assume that it waiting for one of 1-4 to occur (see above).


known loops

i spent some time detecting loops in various games, sadly i did not document the loops early on so i cant know if theyre polling io or waiting on ram.

i'd recommend trying to implement a waitloop detection system to try and find all these loops.

u32 pokeemerald_idle_loop = 0x080008C6;
u32 pokefire_idle_loop = 0x080008BE;
u32 kirby_idle_loop = 0x08000FAA;
u32 mario_advance2_loop = 0x0800052E;
u32 mario_advance3_loop = 0x08002C40;
u32 mario_advance4_loop = 0x0800072A;
u32 crash_of_the_titans = 0x080258BC;
u32 defender_of_the_crown = 0x080007F4;
u32 yugioh_sacred_cards = 0x08003B9A;
u32 yugioh_reshef_of_destruction = 0x08008248;
u32 yugioh_destiny_board_traveler = 0x0801918C;
u32 yugioh_dungeon_dice_monsters = 0x0802CDE2;
u32 yugioh_the_enternal_duelist_soul = 0x08075D8E;
u32 yugioh_ultimate_duelist = 0x080F4BB6;
u32 warioware = 0x08000FA6;
u32 sonic_advance = 0x0800096C;
u32 simpsons_road_rage = 0x08043950; // only used at startup
u32 prehistorik_man = 0x0801E406;
u32 link_to_the_past = 0x080683A8; // todo: check 4 swords
u32 pokemon_mystery_dungeon = 0x0800BA88;
u32 prince_of_persia[] = { 0x08090846, 0x08090850, 0x08090858, 0x080900EE, 0x080938F8, 0x0808FFF2 };
u32 spiderman_move[] = { 0x08009A36, 0x0803624A, 0x080099D6, 0x08036270, 0x08043214, 0x08023E1E, 0x08023E08, 0x08023DF2, 0x0802FDB2, 0x08030856, 0x0802D112, 0x0802D126, 0x0802D13A, 0x08035A9A, 0x0802D27A, 0x0802D2A2, 0x0802D28E, 0x08009C26, 0x08026B10, 0x08007E2C, 0x08026272, 0x08007E92, 0x0802784A };

u32 donkey_kong_country_1 = 0x08032904;
u32 donkey_kong_country_2 = 0x080003D0;
u32 donkey_kong_country_3 = 0x080AF458; // 0x080003A4, 0x080AF458

u32 harry_potter_and_the_sorcerers_stone = 0x0801AB0E;

// dispstat
u32 kirby_and_the_amazing_mirror = 0x08151DAE;
// dipstat
u32 Mario_Luigi_Super_Star_Saga_1 = 0x080185D6;
// mario bros mini game
u32 Mario_Luigi_Super_Star_Saga_2 = 0x08F5078E; // idk
u32 Mario_Luigi_Super_Star_Saga_3 = 0x08F6E38C; // vcount
u32 Mario_Luigi_Super_Star_Saga_4 = 0x08F53CB2; // vcount ram

u32 mario_party_advance = 0x0806FC90; // startup
u32 mario_tennis_1 = 0x0825135E; // startup
u32 mario_tennis_2 = 0x0801389A; // startup
u32 mario_tennis_3 = 0x082525BC; // startup

u32 naruto_ninja = 0x0800247C; // vcount, startup
u32 need_for_speed_underground_1 = 0x08144EA0; // vcount, startup
u32 need_for_speed_underground_2 = 0x0814628C; // vcount, startup

u32 pokemon_ruby = 0x081DE420; // vcount, startup

u32 pokemon_pinball_1 = 0x080542B8; // vcount, startup
u32 pokemon_pinball_2 = 0x08000A3C; // dispstat

u32 sims_2 = 0x08024FD4;

u32 sonic_advance2_1 = 0x0809580C; // vcount, startup
u32 sonic_advance2_2 = 0x0800198C; // dispstat, startup

u32 sonic_advance3_1 = 0x080BA6D4; // vcount, startup
u32 sonic_advance3_2 = 0x080BBE5E; // dispstat, startup

u32 sonic_the_hedgehog_1 = 0x080753A4; // vcount, startup
u32 sonic_the_hedgehog_2 = 0x0804648A; // vcount
u32 sonic_the_hedgehog_3 = 0x08071EE4; // dispstat

u32 fzero_1 = 0x08040604; // vcount, startup
u32 fzero_2 = 0x08000C26; // vcount

u32 froggers_adventures = 0x080008EA; // ram

u32 mario_vs_donkey_kong_1 = 0x08033D44; // startup
u32 mario_vs_donkey_kong_2 = 0x08033EE8;

u32 megaman_battle_network4_1 = 0x08112D28; // vcount
u32 megaman_battle_network4_2 = 0x080003A2; // dispstat
u32 megaman_battle_network4_3 = 0x08000372; // dispstat

u32 spyro_season_of_ice_1 = 0x0805C29C; // vcount
u32 spyro_season_of_ice_2 = 0x08012CCA; // vcount in game when break blue thing
u32 spyro_season_of_ice_3 = 0x08012D06; // vcount in game when break blue thing

u32 yugioh_stairway_to_the_destined_duel = 0x0808978A;

unknown loops

some games clearly must be waitlooping, but i have not detected it with my set-up. if you do detect these loops, please tell me know and where the loop is!

  1. advance wars
  2. castlevania
  3. demikids
  4. driver 2 (0x0803A998 ?)
  5. super mario advance 1
ITotalJustice commented 2 years ago

quick follow-up for what loops should be skipped (imo)

which loops to skip

once you have detected it is a read-only waitloop, next you need to check the address that it is polling. if the address is ram, then the only way for that value can change is via dma or irq handler. otherwise, it's polling IO.

when polling IO, there's lots of edge cases to be aware of. for example, if the game is polling the timer data, then you need to further check what the value it expects is. this can be more complex the bigger the loop as it may involve shifts and adds to the read value, all of this needs to be calculated. finally, when this is done, you can schedule an event at that specific point when the timer will be that value. however there's an edge case. maybe the game fires a dma which disables the timer and then manually writes the reload value. maybe the game writes to the timer from irq handler. maybe the game is polling the timer that is not yet enabled!

as you can see, lots of edge case that add a lot of complexity, and i haven't even covered them all for the timer. this was just 1 IO register as well!

because of this, i think its best to keep the list of loops to skip very very small. this will make analysing the loop itself faster and simple. ideally, you shouldn't need to calculate the value the game expects, you should only calculate the address it is polling. this again saves on time spent alayiseng the loop.

here is my current list of easy addresses to skip:

  1. ram (ewram, iwram, pram, vram, oam etc). can only be changed via dma or irq handler.
  2. vcount, can only be changed via the ppu advancing to a newline
  3. dispstat, can only be changed when going from draw->blank->draw etc

and thats it. very small list, but it's what the vast majority of games poll.

if you think about it, polling a timer is pointless, they can simply set the timer to 0x10000-timeout, set the timer irq handler and then wait until the timer irq handler is fired.


Note: i haven't yet compiled a list of common IO polls. I plan to test ~50 games and compile a list of all IO polls and put that data into a table. if it turns out that 90% of games do poll X io register, then we know it is worth optimising for.

ITotalJustice commented 2 years ago

potential bugs

if the value the game is polling changes in the middle of the poll loop after the load, then skipping the loop has undesirable effects.

here's an example for if the game is polling vcount.

loop:
ld r2, [VCOUNT] ;@ vcount == 158
cmp r2, #159 ;@ vcount == 159
bne loop ;@ i now skip this loop until vcount changes

this would result in skipping an entire frame because vcount will next change to 160, which this loop will still continue until its finally back on line 159. the intended result is actually to have the loop run twice like this:

ld r2, [VCOUNT] ;@ vcount == 158
cmp r2, #159 ;@ vcount == 159
bne loop ;@ no equal so jumps back

ld r2, [VCOUNT] ;@ vcount == 159
cmp r2, #159 ;@ vcount == 159
bne loop ;@ is equal so loop exits

workaround

there's 2 potential workarounds:

  1. keep track of the pc when one of the values changes. then when the loop happens, check if the saved pc is within the loop, if it is, reset that value and don't skip that loop.
  2. set a flag when any of the events fire. so then on loop evaluation, check if the loop is set, if true, then dont skip the loop and unset the flag. this will almost always result in 1 extra iteration of the loop, but is far more simple than the first workaround.

thanks to @calc84maniac for pointing this out to me!

ITotalJustice commented 2 years ago

potential bugs 2

when evaluating the loop, it can be quite expensive to check every single instruction and ensure that it only is reading a location in memory and nothing else. you'd have to ensure that any add, sub, mov is just part of checking if the read back value is what's desired.

an example can be found in yugioh the scared cards where it starts reading from an address in memory, then increments the register that is used to index memory, then it does a compare before branching. i assume this code is trying to find the first value that equals the wanted value.

workaround

evaluate the loop in a 2-pass. on the first pass, do everything like normal, but save a copy of the cpu registers. on the second pass, simply do a memcmp on the current and stored registers. if they're equal, then it's reasonable to assume that the loop is read only.

ITotalJustice commented 2 years ago

fixed in https://github.com/ITotalJustice/notorious_beeg/commit/383462853534b412450654b2a96dbc1cea6fe9d1