Address core memory swizzling (somehow)

CasualPokePlayer commented 10 months ago

Sega Saturn has a particular issue where memory appears to be byteswapped, and pointers are 4-bytes large, resulting in difficulties for creating achievements.

This is rooted in a core issue of libretro memory handling, which cheevo sets ends up expecting. libretro will just obtain pointers to core memory. Memory will proceed to be read, byte by byte, to construct memory values.

In theory, this all should work ok. In practice, some cores utilize a trick known as swizzling. This is so an emulated system with an endianess opposite of the host system can directly do common read/writes (assuming the width for common read/writes is greater than 1 byte) under the host system's endianess. When it comes time to do a 1 byte read, the address will be xored with a constant before being used to index a memory buffer (for 2 byte swizzling, this constant is 1, for 4 byte swizzling, this constant is 3).

This knowledge is even worse when you consider this is only going to possibly occur when the host system's endianess is opposite of the emulate system's endianess. If you change the host system so something which matches the emulated system's endianess, you will now read memory "correctly" as it's no longer being swizzled. Of course, this ends up resulting in a mismatch compared to the swizzled memory, and sets would only be made one way or the other.

In practice of course, this trick is limited to emulated big endian systems, as most host systems will be little endian. Not all emulated big endian systems will utilize this trick anyways, some like the Jaguar and GameCube/Wii have big endian memory, but emulators for those systems do not swizzle memory.

The selection of consoles which this trick actually occurs is also very limited. Sega Mega Drive / CD / 32X all utilize this (and of course those are more or less all under the same emulators), Sega Saturn also utilizes this. N64 utilizes this, but instead of simple byteswapping, it does the swizzling by 4 bytes (so ^ 3 instead of ^ 1 for 1 byte reads). (There could be some other systems I'm missing here)

Also in practice, the issues of a big endian host system typically won't actually occur. This will mean the user is running RetroArch on a Wii, Wii U, or old PowerPC Mac, and achievement hunting, and probably doing so with the Genesis Plus GX core. A combination of factors so unlikely it apparently has never actually resulted in some ticket being filed, as I'm told.

As far as I can tell, the libretro API does not offer any way to know if memory is being swizzled or not. It seems to have some kind of flags for retro_memory_map's, so perhaps it'd need a flag added?

Even if this is "fixed" you may still end up needing to fix a ton of existing sets, which all rely on the existing memory behavior, and users will be forced to update their cores / clients, which would create a ton of updating pain.

BizHawk actually normally fixes swizzling for its memory APIs, but for cheevo purposes it has to swizzle it back, due to the mentioned existing sets relying on memory swizzling.

redwizard42 commented 10 months ago

This swizzling has definitely annoyed me in other Sega consoles too. Definitely saw it in Mega Drive when debugging with external tools and say that each pair of bytes was swapped in libretro/RA map space compared to the system's addressing. Shadow of the Beast's Score address had to be math'd together, etc.

redwizard42 commented 10 months ago

I do wonder now if there is a way to specify swizzled vs unswizzled for a set, so if an update path moves to unswizzled as a standard, all the existing sets don't immediately break, and can get updated in time. Code notes on site would need to be adapted as well. Headache.

Jamiras commented 10 months ago

It seems like the easiest solution would be to add a 32-bit Middle-Endian size that would "de-swizzle" the value. Am I missing something?

The host's endianness should affect the emulated memory's layout. Emulated memory should have the same byte order as the original system. The fact that we read multiple bytes at the same time for use in calculations on the host system means we need to be responsible for making whatever translation is necessary.

The only way I can see the host architecture affecting the emulated memory is in systems that recompile to the native architecture. In that case, I could see the recompiled code writing stuff to RAM in a different order than it would appear in the emulated system, but I could also see the systems being emulated doing something like occasionally validating byte order to ensure they're not being emulated.

redwizard42 commented 10 months ago

It seems like the easiest solution would be to add a 32-bit Middle-Endian size that would "de-swizzle" the value. Am I missing something?

The host's endianness should affect the emulated memory's layout. Emulated memory should have the same byte order as the original system. The fact that we read multiple bytes at the same time for use in calculations on the host system means we need to be responsible for making whatever translation is necessary.

The only way I can see the host architecture affecting the emulated memory is in systems that recompile to the native architecture. In that case, I could see the recompiled code writing stuff to RAM in a different order than it would appear in the emulated system, but I could also see the systems being emulated doing something like occasionally validating byte order to ensure they're not being emulated.

A Middle-Endian that handles the 2143 caused by the swizzling would solve the pointer issue for Saturn at least. (Ideally we should be able to filter on it aligned).

Weird things happen when 16-bit and 32-bit data isn't aligned. My 16-bit health in Princess Crown looks like this because it started at an odd byte address:

High Low

Jamiras commented 10 months ago

The 16-bit and 32-bit views in the toolkit are for visual purposes only. When the system reads 16-bits from an odd address, it reads the odd byte, then the byte immediately following it and combines them. The gap is only present because the view is swapping things around.

I really those views had never been added. People rely on them like a crutch and then complain when things like this happen.

redwizard42 commented 10 months ago

The 16-bit and 32-bit views in the toolkit are for visual purposes only. When the system reads 16-bits from an odd address, it reads the odd byte, then the byte immediately following it and combines them. The gap is only present because the view is swapping things around.

I really those views had never been added. People rely on them like a crutch and then complain when things like this happen.

Oh, I'm not talking about the views (I always steer people away from them as they tend to mislead junior devs). That's the 8-bit view I am giving an example of. If 16-bit data is unaligned in the hardware address space,, the resulting two bytes of the data are separated with two other bytes in between. Another fun example is Shadow of the Beast (Mega drive)'s score is split up weird when probably it could have been a 24-bit value if not swizzled. ;)

CasualPokePlayer commented 10 months ago

The host's endianness should affect the emulated memory's layout. Emulated memory should have the same byte order as the original system.

These statements are fundamentally incompatible with each other and the latter is not what happens in practice.

If host endianness is affecting the emulated memory's layout, this itself means that if you were you read memory, byte by byte (where endianess otherwise should not apply), you would get different results depending on the host system (which would break existing cheevos, although as detailed the cases where this happens is so limited it practically never happens).

(For arguments sake, we are only talking about memory layout in terms of the byte by byte order. Whatever the host system has as a u16/u32 doesn't matter here, especially as with current code, such multiple byte reads into u16/u32 will be constructed with byte by byte reads with shifts/ors)

If emulated memory has the same byte order as the original system, this will always result in a consistent byte order, as the original system has only one endianess and thus only one byte order, and thus too the host endianess should have no effect over this.

(For arguments sake, I'm ignoring emulated systems which do actually have mixed-endian CPUs. This is NOT what is being discussed here, and I'm fairly sure none of rcheevo emulators actually emulate CPUs with this kind of ability, so for this issue we can assume they don't exist and when they do that's a separate issue)

However, of course, this is not what happens in practice. The host endianess is deciding the byte order, as a speed trick emulators sometimes used, commonly for emulated big endian systems on host little endian systems. Internally, emulators would be applying XOR's to correct this swizzling when reading byte by byte. For libretro however, these XORs are not present, resulting in this emulator speed trick changing the expected memory layout for tools like rcheevos.

CasualPokePlayer commented 10 months ago

The only way I can see the host architecture affecting the emulated memory is in systems that recompile to the native architecture.

It's actually much more simple than that to end up having host byte order being used for the emulated memory byte order. https://github.com/ekeeke/Genesis-Plus-GX/blob/d4ca576c073cfa0ac347bc25b3986201f1667608/core/mem68k.c#L46-L62 https://github.com/ekeeke/Genesis-Plus-GX/blob/d4ca576c073cfa0ac347bc25b3986201f1667608/core/macros.h#L6 https://github.com/ekeeke/Genesis-Plus-GX/blob/d4ca576c073cfa0ac347bc25b3986201f1667608/core/macros.h#L27

(well, not the best examples in the code, but more shows how it just does a uint16_t* cast to do a direct uint16_t write, and for byte by byte reads, XOR'ing by 1 if little endian, and reading directly if big endian)

Jamiras commented 10 months ago

With a rare exception, every piece of addressable memory is 8 bits in size. When you read 16 bits from an address, you also read the 8 bits from the next address of memory. How those two addresses of memory are combined in the register is system dependent, but the memory itself will always have the same layout.

On an x86:

mov eax, [ebx]
inc eax
mov [ebx], eax

will perform a little endian read starting at [ebx], add one to it, and write it back in little endian order.

mov al, [ebx+1]

I'm not sure that's 100% the correct syntax, but the intent is to read the second byte from the data that was already stored in little endian order. If the emulator decides that it should store it in big endian order because the host is big endian, then the wrong value will be read from memory and the emulator won't behave as expected.

The RetroAchievements interface is more-or-less designed to directly read values from the RAM chip. It doesn't care about the system's endianness. We're not pretending to be the CPU reading from the RAM chip, or through the bus. Therefore we don't [directly] care about the system's endianness. We can, however, leverage that knowledge to specifically say we want to read four bytes from the RAM chip as a big-endian 32-bit number, regardless of whether or not the system is bigendian.

CasualPokePlayer commented 10 months ago

A better case perhaps could be made with how the games would be writing to memory.

Let's say Game A writes 0xAA 0xBB 0xCC 0xDD to addresses 0 1 2 3 in emulated memory respectively, byte by byte.

We could agree then that rcheevos should expect to see 0xAA 0xBB 0xCC 0xDD in addresses 0 1 2 3 in emulated memory.

However, with memory swizzling in place, this no longer applies.

If we assume that the swizzling is swapping by 2 bytes, we would actually end up seeing 0xBB 0xAA 0xDD 0xCC, as rcheevos is just reading the memory byte by byte, yet the emulator is mutating the emulated addresses Game A is writing to with a XOR 1.

The proper fix is to also apply a XOR 1 when reading out the memory buffer (well, arguably, this should be done by the client, as to eliminate emulator specific details).

Jamiras commented 10 months ago

The examples you provide show the translation from the emulated memory to native values (presumably to simulate the machine's registers for emulating the CPU instructions).

We handle this on our side through the peek memory APIs. When reading a singular piece of memory (up to four bytes), we specify that it should be returned as a uint32 read in little-endian order: https://github.com/RetroAchievements/rcheevos/blob/4e5a863269771f3f0ff9206d0d360a61ef861c7f/include/rc_runtime.h#L31-L36

When reading chunks of memory, we do the translation to a little-endian uint32 ourself: https://github.com/RetroAchievements/rcheevos/blob/4e5a863269771f3f0ff9206d0d360a61ef861c7f/src/rc_client.c#L4402-L4481

Jamiras commented 10 months ago

Let's say Game A writes 0xAA 0xBB 0xCC 0xDD to addresses 0 1 2 3 in emulated memory respectively, byte by byte.

If we assume that the swizzling is swapping by 2 bytes, we would actually end up seeing 0xBB 0xAA 0xDD 0xCC, as rcheevos is just reading the memory byte by byte, yet the emulator is mutating the emulated addresses Game A is writing to with a XOR 1.

We don't care what the emulator does with its internal structure. All we care is that the memory contains 0xAA 0xBB 0xCC 0xDD.

As long as every emulator exposes the memory with 0xBB in address 1, the achievement code will function properly. The achievement writer can read that as a 32-bit little endian value: 0xDDCCBBAA, a 32-bit big endian value: 0xAABBCCDD, a float: -1.84407149017e+18, or a number of other formats, including sizes using less than 32-bits.

CasualPokePlayer commented 10 months ago

As long as every emulator exposes the memory with 0xBB in address 1, the achievement code will function properly.

And this does not occur, that's the entire point of this issue.

And in fact, it's rather the opposite, if the emulator exposes the memory with 0xBB in address 1 while core is swizzling memory internally, achievement code will break, as existing achievements rely on this broken assumption for Genesis (and its addons), Saturn, N64, and maybe others I forgot about (and of course, this assumption itself will break on any host big endian system).

Maybe then arguably this is the wrong place for this issue, but the fact is if clients correctly expose memory, existing sets will break, forcing them to retain broken code unless some plan is put out so this breakage does not occur.

Jamiras commented 10 months ago

I guess I don't see how the emulator can rearrange the memory structure without breaking compatibility. I know I've done things like:

if (*(uint32_t*)(string_ptr) == 0x54534554) { // TEST in ASCII

Assuming little endian memory reads, and that would completely break in an emulator that swizzled the memory.

CasualPokePlayer commented 10 months ago

In terms of what the game sees, it cannot know of this kind of swizzling. It only sees whatever is the emulated system's endian, due to the mutation of the address when reading out byte by byte of whatever memory buffer is holding emulated memory.

Libretro however just exposes pointers to these memory buffers and all rc_libretro does here is do a memcpy / just read memory out byte by byte, which is then constructed to "little endian" read for rcheevos (and arguably it is correct to construct little endian reads with byte by byte reads).

However, the emulator specific detail of memory swizzling means byte by byte reads will not actually read the same values as if you were to do a byte by byte read as the game would do, as rc_libretro isn't applying the same mutation of the address as the emulator would do for the game (and currently it can't know if this would be needed to the lack of flags for this on libretro's part).

Additionally, this memory swizzling is host endian dependent, so these byte by byte reads will different between machines of different endianness, something which rcheevos expects to not happen (and kind of means big endian systems do not have working cheevos for systems with memory swizzling, of course this is so limited by the lack of big endian systems people would actually use and really would be limited to Genesis Plus GX if anything).

And of course, since this is emulator dependant, this means that any kind of potential future emulator could decide to forgo memory swizzling, and if this is some libretro core, it would be locked out of being able to be used for cheevos, due to this flaw in libretro memory exposure. If it isn't some libretro core, it could still support cheevos, but would be forced to apply a hacky read callback for rcheevos.

Arguably, again, the correct way to fix this is for some flag to be added for libretro memory mappings and that is used to account for memory swizzling. But that will break existing cheevos...

CasualPokePlayer commented 10 months ago

Implementing some way to handle the weird byte swapping for pointers on Saturn specifically would "fix" that issue for deving, but it's also ultimately applying a bandaid for this kind of issue and only subsquently making breakage worse if a "proper" fix is done.

redwizard42 commented 10 months ago

Implementing some way to handle the weird byte swapping for pointers on Saturn specifically would "fix" that issue for deving, but it's also ultimately applying a bandaid for this kind of issue and only subsquently making breakage worse if a "proper" fix is done.

Maybe it's a bandaid, but it would help out a lot right now and I doubt it'd make things harder to implement a better way in the future.

leiradel commented 10 months ago

I thought all memory accesses were done via rc_runtime_peek_t? If that's the case, the implementation can be core-dependent and and do the unswizzling if necessary.

invertego commented 6 months ago

It seems like the easiest solution would be to add a 32-bit Middle-Endian size that would "de-swizzle" the value. Am I missing something?

This would not address the fact that even single byte accesses are affected by this issue. Here is a real-world example.

MOV     #$40,R0
MOV.L   @($0174,PC),R14 ; R14 = $060FE77C
MOV.B   R0,@R14

This snippet of code from the US release of the Saturn title Dark Savior performs a single byte write to upper work RAM.

To read this address for an achievement you would need to use the address 0x1FE77D. That is to say, within the upper work RAM of the Saturn, a single byte write to 0xFE77C is observed as a write to 0xFE77D in RetroAchievements.

This is a consequence of the Mednafen and Yabause libretro cores, as an internal implementation detail, storing memory pre- byte swapped by 16 bits to optimize accesses of that length. 8- and 32- bit accesses are specially handled by these emulators so that this detail is not observable to the emulated title; accesses of all lengths behave as they would on original hardware.

Unfortunately, this implementation detail is now leaking into achievements written for this system and others that are similarly affected. The problem compounds over time as more achievements are created. If this were to be fixed, all existing achievements would need to be updated which, although trivial for 8- or (aligned) 16- bit accesses, would require careful auditing for 32-bit accesses.

RetroAchievements / rcheevos

Address core memory swizzling (somehow) #302