Pyrdacor / Ambermoon

Resources for the incredible Amiga game Ambermoon
91 stars 8 forks source link

[Documentation] Fixing the "out of memory handles" crash #22

Closed Pyrdacor closed 2 years ago

Pyrdacor commented 2 years ago

The quick fix

Here is the final quick fix with a hex editor. This example uses english 1.07 AM2_BLIT.

Search for hex 10 28 00 10. You will find two occurences. Start with the first one. It should be at 00006E7E.

Cut out the 4 bytes and insert them before the preceeding 12 bytes which are 4E B9 00 02 7F 22 4E B9 00 02 13 8A. Or cut out the 12 preceeding bytes and insert them after the mentioned 4 bytes. Do as you wish.

Before:
    4E B9 00 02 7F 22 4E B9 00 02 13 8A 10 28 00 10
After:
    10 28 00 10 4E B9 00 02 7F 22 4E B9 00 02 13 8A

Now go to the second occurence of 10 28 00 10. Should be at 0000BDF8. Cut the 4 bytes but this time insert before the preceeding 6 bytes (4E B9 00 02 7F 22).

Before:
    4E B9 00 02 7F 22 10 28 00 10
After:
    10 28 00 10 4E B9 00 02 7F 22

Now search for the first occurence of 4E F9 in the file. It should be at 00000038 in AM2_BLIT. A search for 4E F9 00 00 00 2A should also work and also return a single hit. This marks the beginning of the first code hunk btw. ;)

Now take a calculator. Use the initial found offsets:

Subtract the offset you found just a second ago:

Now subtract 0A and 04 from the first and 04 from the second value:

Now search for those 3 values. There should be only 1 occurence if you include the zero bytes into the search! The first two values should be found right next to each other: 00 00 6E 3C 00 00 6E 42.

Replace each long (4-bytes value) with the value + 4:

Before:
    00 00 6E 3C
After:
    00 00 6E 40

Before:
    00 00 6E 42
After:
    00 00 6E 46

Before:
    00 00 BD BC
After:
    00 00 BD C0

This is it.


Old stuff following

The fix

Open AM2_CPU and AM2_BLIT into a hex editor.

Search for the hex sequence "10 28 00 10 4E B9 00". If you find two occurences, go to the last one. Cut out the 6 bytes just in front of the found sequence and paste it back in after the "10 28 00 10".

Then search for the hex sequence "00 00 BD BC 00 00 BD C6". If you find more than one, pick the first one. Replace the "BC" with a "C0" and you are done.

This will work for english and german.

Explanation

The first change fixes the order of the first two instructions:

image

The function FUN_FreeMemoryHandleById releases a memory handle by its id. The id must be in D0 which is performed by the move instruction. Unfortunately there is a function call just above which clears (0x10,A0) internally. This way the move instruction will then only move the value 0 to D0 and thus FUN_FreeMemoryHandleById tries to free the memory handle with id 0.

Valid memory handle ids are 200 to 224 (there are 25 slots). With the above bug, the slot is not freed and can no longer be used. If this is called often enough, all 25 slots will be occupied and the error is raised. Moreover FUN_FreeMemoryHandleById will clear some long-word at the address 800 bytes before the memory handle slots as the function just subtracts 200 from the given id (for 0 this will be -200), multiplies this by 4 and adds it to the start of the memory handle section. Then it clears the long-word there. So this will be -800 bytes relative to the memory handle section.

The above bugged code is called by the "Call Eagle" spell function. This is why calling the eagle will mess up the memory handles.

\ The second change adjusts the relocation table as we moved a function call which uses a global 32 bit address. As we moved the move instruction before the function call and the instruction has 4 bytes, the offset inside the RELOC32 hunk must be increased by 4. This is the BC to C0 part.

\ Btw the function we moved (FUN_00246f22) looks exactly like this, so you see that it clears our memorized memory handle id.

image

EDIT 04-07-2022:

There is a second bug. The above bug avoided freeing the eagle sprite memory handle when calling the eagle. But when unmounting the eagle, there is similar code with the exact same issue. To fix it, again search for 10 28 00 10. This time take the other spot. Move those bytes before the function call (4e b9 00 02 7f 22) again. This will fix the whole issue once and for all!

Hexaae commented 2 years ago

When calling the eagle/unmounting I still have some random crashes even with 1.13 patch applied to the WHDLoad installation of the game (already double-checked the files) 😔 Not easy to reproduce because I have to play for at least 10min... but still happens unfortunately. WHD crash dump with readable output: https://1drv.ms/u/s!ApMUGr0cuN39gotVpE_rCGXGIKQXsA?e=IMmIjj

P.S. I use some very old save games (2001, Ambermoon 1.08)... can it make this patch useless in this case?

Pyrdacor commented 2 years ago

Can you please try to just mount/unmount the eagle for about 15 times in a row?

I am not that familiar with WHDLoad slaves but don't they change the code inside AM2_CPU and AM2_BLIT? Did you replace those files?

Hexaae commented 2 years ago

Yes, I replaced the exe files too. No problem after 15 times, but going on it will still happen. For my test I usually try to fly with the eagle 🦅 over the border of the map when 0, 0 turns to 600, 600 (or so, can't remember max X, Y) and fly for 2-3 days, landing & sleeping, and flying again... and well, after I try to do so for a while and after 20-25 mount/unmount the moment I call the giant eagle again it will pop-up the usual "A majestic eagle blah blah..." then black screen to Workbench with mem error I guess). Will now test just mount/unmount in the same place for 30 times, let's see if I can reproduce it just increasing this numbers...

Hexaae commented 2 years ago

Confirmed it happened after 25-26 times without moving (so it's not a map border issue or a consequence of sleep day/night at least... we can exclude them)! 26 seems the exact number. P.S. In the explanation above I've read you were talking about 25 slots indeed for mem handles... In the meanwhile I'll try also without WHDLoad but I doubt the installer changes something in the EXE files in memory.

Hexaae commented 2 years ago

Ok, happened also running the patched game without WHDLoad:

An error has occurred :

There are not enough memory handles available.

This is an internal error. Our most sincere apologies.

You can try to play on simply by rebooting your computer and restarting the game. The error may not appear every time.

Please press RETURN.

a1exh commented 2 years ago

I am not that familiar with WHDLoad slaves but don't they change the code inside AM2_CPU and AM2_BLIT?

Yes or sometime replace the exes altogether.

It is unlikely a WHDload slave will work correctly with a modified compressed executable like Ambermoon (but not impossible).

Hexaae commented 2 years ago

As far as I've tested, only "Ambermoon" file has to be Imploder compressed like the original, then you can use modified files for the exes (but AM2_BLIT and AM2_CPU must be decompressed to work with WHDLoad. I use xfddecrunch), data, savegames etc. (indeed I imported my WB game into a WHDLoad installation).

In short, to convert it to WHDLoad you can overwrite the data/amberfiles/ subdir of the installed game with the patch files (PyrdacorFixEnglish1.13.lha), and then just have to decompress Imploder files AM2_BLIT and AM2_CPU. WHDLoad installer will still work fine.

Are the exe files in PyrdacorFixEnglish1.13.lha already patched, right?

Hexaae commented 2 years ago

@a1exh Can you please try to mount/unmount 26times in a row from the eagle to confirm?

a1exh commented 2 years ago

Are the exe files in PyrdacorFixEnglish1.13.lha already patched, right?

They are supposed to be but there is always a possibility that something has gone wrong.

This is the commit

https://github.com/Pyrdacor/Ambermoon/commit/980464bf13a68b7bacf42baa1c654ff45973da31

I'm a little nervous that English AM2_CPU says +0 bytes but AM2_CPU for German version says -4 Bytes.

So does AM2_BLT for both English and German say -4 Bytes.

This may be nothing. If I had time I would binary diff them and the changes.

Pyrdacor commented 2 years ago

It is also possible that there is a second crash issue. I will check that soon. Can you try with 1.14 as well? It has a complete new exe. But please use without WHDLoad and the full version as a clean state. I didn't have the time to release only the patch nor LHA files so only zip and tar.gz at the moment.

Hexaae commented 2 years ago

Still the same auto-quit issue after 26 Eagle mounts, ran from WB no WHDLoad:

An error has occurred :

There are not enough memory handles available.

This is an internal error. Our most sincere apologies.

You can try to play on simply by rebooting your computer and restarting the game. The error may not appear every time.

Please press RETURN.

Could be something in the savegame or it can't be? Here is it anyway (use the MEGAHERO savegame): https://1drv.ms/u/s!ApMUGr0cuN39gowEjNyqUlDiqliexg?e=tXiZMq

P.S. Yep, doesn't work anymore with WHDLoad. I force-downloaded the 1.14 archive of course

a1exh commented 2 years ago

If this is not 100% fixed can we add a bug LABEL?

Hexaae commented 2 years ago

BTW, I have to report to WHDLoad team this new single exe change for a new slave... EDIT: DONE.

a1exh commented 2 years ago

@Pyrdacor maybe we (and by that I mean you) should work towards removing any need for a WHDload slave? The source is included in the WHDload slave. I think they only patch 2 places.

http://whdload.de/games/Ambermoon.lha

Ambermoon Install\src\ambermoon.asm

I'm not sure there is really a need for Ambermoon WHDload but I haven't used real 68060 for a long time.

Hexaae commented 2 years ago

Yes... I prefer the WHDLoad version as I've had some instability in the past after running Ambermoon from the OS. It also reports some strange memory errors at launch on my 060 WinUAE system when I use 8MB CHIP: as I remember well it used some kind of fake fastmem tool, and gets confused if finds enough mem to contain the whole game in CHIPMEM (yep, will load & run the game from CHIP only!). Or maybe Pyracord could just implement the same WHD fixes directly in the new exe.

Pyrdacor commented 2 years ago

I can try to do this tomorrow.

Hexaae commented 2 years ago

News?

Pyrdacor commented 2 years ago

Sorry I will have a look at it asap.

Pyrdacor commented 2 years ago

I'm in contact with Wepl from WHDLoad now. He said he can create a slave for the recent release but I don't know when he will do so.

Hexaae commented 2 years ago

Good. Can he also help us fixing the "26 giant eagle mounts" bug? Whd team is very skilled in fixing original games

Pyrdacor commented 2 years ago

I finally had time to have a look. And unfortunately my fix was no longer present in 1.14. I guess when I moved over to Nico's Ghidra project and re-installed the fixes there, I forgot to add the eagle fix. :( So I fixed it now again in Ghidra and will release a new version in the next days.

Pyrdacor commented 2 years ago

@Hexaae Can you please test with this AM2_CPU?

AM2_CPU.zip

Pyrdacor commented 2 years ago

I released 1.15 with the fix. Hopefully it works. Otherwise feel free to reopen this issue.

Hexaae commented 2 years ago

Still unfixed. I used PyrdacorFixEnglish1.15.lha.

An error has occurred :

There are not enough memory handles available.

This is an internal error. Our most sincere apologies.

You can try to play on simply by rebooting your computer and restarting the game. The error may not appear every time.

Please press RETURN.

Pyrdacor commented 2 years ago

Can you test to use the AM2_CPU from the 7z I provided earlier?

Hexaae commented 2 years ago

You mean the ZIP? It's the same binary in the PyrdacorFixEnglish1.15.lha, where I can reproduce the bug. If you're sure it's fixed maybe is something in my savegames? Please test my savegame above at https://github.com/Pyrdacor/Ambermoon/issues/22#issuecomment-1126294694

Pyrdacor commented 2 years ago

It can't be related to the savegame. But can you ensure you have no AM2_Blit in your installation and that you copy the file "Ambermoon" over in the root directory?

Hexaae commented 2 years ago

Yes. You can test it yourself (boring but will take less than 5 min ;) ). Tried also with "ambermoon_english_1.15_extracted.lha" with everything ready out of the box, where I just copied my savegames.

Pyrdacor commented 2 years ago

Sure I will test this but I am not at my PC until Monday so I hoped you can have a look. I will work on a solution next week. Sorry that it doesn't work.

Pyrdacor commented 2 years ago

Ok I could reproduce the issue. Message to myself: never do stuff in a hurry. The problem seems two-folded.

The first bug was the one I mentioned here. The problem is that acquired memory handles are not freed for the eagle sprite data. This was fixed pre 1.15 already. But as you said it wasn't working, I thought that my fix wasn't present (the order of the two code lines was still wrong). So I swapped the lines without really thinking. So in 1.15 I re-implemented the bug which was already fixed... This is why it now crashes after calling the eagle 13 times. I fixed it again now. This was a lot confusion.

So there has to be a second bug. This seems to occur when the eagle is called 25 times. I think that some kind of index is not reset as well. I have to look at this now.

Pyrdacor commented 2 years ago

Now I get it. What I never noticed was that when you unmount the eagle, there is also an animation where the eagle flies off. This is not handled by the "Call Eagle" spell function of course. I am sure the same bug described here was done there too. This is why it crashes at the 13th eagle call without the fix because mounting and unmounting 12 times acquire all 24 memory handles together and the 13th call has no handle left. With my fix the handles are only not freed for the unmounting part. This is why the crash happens only at the 25th eagle call. I will find the spot now and fix it as well.

Pyrdacor commented 2 years ago

As I thought the same issue is inside the "Release Eagle" function! Fixed it now.

Pyrdacor commented 2 years ago

AM2_CPU.zip

Ok I tested this in WinUAE and it seems to work. No crash after 28 eagle mounts and unmounts. @Hexaae Can you please verify with the attached AM2_CPU?

Hexaae commented 2 years ago

Works fine, confirmed! :)

Hexaae commented 2 years ago

Can I still use old AM2_CPU exe 1.13 compatible with WHDLoad and manually apply the fix above with an hex-editor, while waiting for a WHDLoad update from Wepl? Or the new exe fixed also other things?

a1exh commented 2 years ago

I'd still like to understand what WHDLoad could add to make it more stable?

I looked at the Ambermoon WHDload code patches, most are required to make Ambermoon work with WHDLoad not actually bug fixes in the code.

Pyrdacor commented 2 years ago

@Hexaae The problem is that 1.14 is a completely different thing. It loads item data and in-game texts from external file instead of from data hunks inside the exe. But if you use the old loader (file Ambermoon) and the old AM2_CPU it should work. Note that for some CPUs or configurations the AM2_Blit is used instead (if the blitter is faster than the CPU). So you have to fix it in the right exe or in both.

Pyrdacor commented 2 years ago

I'd still like to understand what WHDLoad could add to make it more stable?

I looked at the Ambermoon WHDload code patches, most are required to make Ambermoon work with WHDLoad not actually bug fixes in the code.

Currently all my time is used to find and fix the end crash. After that I might look into the WHDLoad thing.

Hexaae commented 2 years ago
V1.1 (01-03-2018)
- fixed access fault due vbr access
- fixed access fault during inventory access (only DE 1.01)
- changed to new kickemu style
- due Slave limitations the install uses always blitter rendering (maybe
  changed in later release)
- updated install script
- uses now "data" directory for all game files (create and move manually to
  avoid reinstallation)
- added more icons and some docs
V2.0 (18-03-2018)
- reworked for fast memory support
- cpu caches are now enabled
- for the german version the keymap will be loaded now, to get the needed
  files either reinstall or copy RUN, SETMAP and DEVS:KEYMAPS/D from the first
  disk to the data installation directory
- all the files to update the german version to 1.05 are now included in the
  installation package, to update manually copy all files from data-de to the
  installation data directory
- some insufficient interrupt acknowledgements fixed
- mouse button handling in the extro fixed
- IOCACHE adjusted for minimum switches during save  

Apart some WHDLoad related fixes, some access faults and more...

Pyrdacor commented 2 years ago

The question is if those access faults were caused by the slave/code changes or were present beforehand. I never encountered a inventory crash for example.

Hexaae commented 2 years ago

@Hexaae The problem is that 1.14 is a completely different thing. It loads item data and in-game texts from external file instead of from data hunks inside the exe. But if you use the old loader (file Ambermoon) and the old AM2_CPU it should work. Note that for some CPUs or configurations the AM2_Blit is used instead (if the blitter is faster than the CPU). So you have to fix it in the right exe or in both.

Will try... It's not that clear the last fix.... In general, could you please simplify the how-to fix descriptions and just tell what byte sequence XXXXXXXXXXXXXXXXXXXXXXX at offest XX should become (classic before/after)? 😇

a1exh commented 2 years ago

Apart some WHDLoad related fixes, some access faults and more...

I read the description. And I read the source code for the patch. I read the documentation about the macros/functions used in the patch.

The "fixed access fault due vbr access" these are not bugs, they are an incompatibility between WHDload and Ambermoon.

some insufficient interrupt acknowledgements fixed

This appears to be the only "real" bug fix but I don't understand what the cause is.

a1exh commented 2 years ago

I never encountered a inventory crash for example.

This was only in DE v1.01 which we've never really used.

Pyrdacor commented 2 years ago

Apart some WHDLoad related fixes, some access faults and more...

I read the description. And I read the source code for the patch. I read the documentation about the macros/functions used in the patch.

The "fixed access fault due vbr access" these are not bugs, they are an incompatibility between WHDload and Ambermoon.

some insufficient interrupt acknowledgements fixed

This appears to be the only "real" bug fix but I don't understand what the cause is.

That's what I thought too.

Hexaae commented 2 years ago

As mentioned before the original behaves strangely with CHIPmem allocation causing random not enough mem issues at launch or quitting the game on 060 and advanced systems... (Imploder, used for some files, also has no reliable compatibility with 060 AFAICR, but in this case an xfddecrunch can decompress them and should solve the problems...). The WHDLoad edition workarounds also all these annoyances...

Pyrdacor commented 2 years ago

The memory allocation in Ambermoon is a bit strange and can easily cause an Out of Memory error. I just recently decoded it.

Pyrdacor commented 2 years ago

It works like this:

For each memory type it does 10 iterations. Each iteration asks the OS for the biggest available chunk of memory. If it is at least 5120 bytes, it is allocated. So this is very greedy. It will take all the memory if it can. But there is a value which states how much memory should remain free after all allocations. So after the 10 iterations Ambermoon again asks the OS how much free space there is. If there is not enough left (10240 bytes is the value in Ambermoon for both mem types), the allocated blocks are freed until enough memory is available for the OS. This can also be a partial free (which is a bad thing but works).

First this is done for chip mem. Then for fast mem. 280000 bytes is minimum for chip, and 75000 for fast. If not enough chip the error is raised. If not enough fast, chip is checked for the total amount (355000). If not enough the error is also raised.

Note that all memory blocks below 5120 are not considered so even if you have enough free memory but in a bad distribution, you will get the error.

Hexaae commented 2 years ago

The problem is that, with the WB version, sometimes on quitting back to WB a nice pop-up tells me the "Ram Disk is full" 🤔 I suppose it's too fast to quit and return to WB while RAM is not fully released.

image

Hexaae commented 2 years ago

It's unusable on my system the non-WHDLoad version: it's a jackpot running it from WB. It works most of the times, but sometimes freezes with black screen, sometimes I have strange Ram Disk full warning after I quit to WB, some others it refuses to load like the picture below...

image

I tried to manually apply the fixes to AM2_CPU and BLT as described (to run it with WHDLoad), but 1.13 binaries (decompressed with XFDDecrunch) look different in contents.... For example there is no "4e b9 00 24 6f 22" for the second patch in the 1.12/1.13 Eng exes: image

Pyrdacor commented 2 years ago

I just released 1.16. First try this please. There was a bug which corrupted the seglist so that the game crashed at quit. Maybe it also caused your RAM disk issues? Give it a try.