BartmanAbyss / vscode-amiga-debug

One-stop Visual Studio Code Extension to compile, debug and profile Amiga C/C++ programs compiled by the bundled gcc 12.2 with the bundled WinUAE/FS-UAE.
GNU General Public License v3.0
303 stars 38 forks source link

fs-uae hang on macos sonoma #243

Closed terriblefire closed 5 months ago

terriblefire commented 7 months ago

Clean MacOS install on Sonoma. Installed the extension and try to run a clean newly "inited" project. Debugger hangs forever before any screen activity is shown.

If i compile a clean fs-uae and ln -s it in place it runs but quits because it cannot connect to the debugger. are the source level changes available somewhere for fs-uae so i can see if its just a binary thats no longer compatible?

BartmanAbyss commented 7 months ago

yep, the modified FS-UAE github repo is linked from the README file

terriblefire commented 7 months ago

ok i'll give that a try and report back. thanks for the quick response Sent from my iPhoneOn 2 Dec 2023, at 10:37, BartmanAbyss @.***> wrote: yep, the modified FS-UAE github repo is linked from the README file

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

terriblefire commented 7 months ago

Ok looks like deadlocks are happening randomly on the FS-UAE on macos. I'm sure people must be seeing this randomly and having to kill the fs-uae app and restart. My best guess is that there is a race condition between the debug features and some other part of the code. There are so many differences between base FS-UAE and that repo its tough to trace. I spent a couple of days trying to figure out where the hang occured but it seems random. I can move the hang around by adding delays. So my guess is a timer triggers causing it.

I raised an issue on @grahambates's repo but he's since disabled issue tracker for the repo. So i guess that means its not getting fixed. TBH i'd remove MacOS support if there is no intention to fix this.

grahambates commented 7 months ago

I haven't deliberately disable the issue tracker, I think that just happens by default on forks. I will look into this. Please don't remove support!

grahambates commented 7 months ago

For context, the reason for the large divergence from the base FS-UAE is because I backported changes to bring it in line with current WinUAE. This is necessary because the profiler relies on internal data structures, which have changed significantly since WinUAE 4.2, which is where the official FS-UAE build is at.

FWIW I use this extension frequently on MacOS Ventura without encountering this issue. I'll try to get to the bottom of it.

terriblefire commented 7 months ago

OK it just seemed odd the issues tab vanished after i raised an issue so i made an assumption.

I will help you figure the issue out and if Bartman is happy we can use this thread to track the issue.

FWIW i didnt see the issue on Monteray and i only see it intermittently on my iMac 2020. I see it every time on my macbook pro. I never used Ventura i jumped from Monteray to Sonoma.

I dont know if its relevant but i have a "normal" FS-UAE installed as an App.

grahambates commented 7 months ago

Looks like you opened the issue on the upstream repo https://github.com/FrodeSolheim/fs-uae/issues/339. I've enabled issue tracking on my fork now, so you can open it there or just continue the discussion here.

terriblefire commented 7 months ago

Looks like you opened the issue on the upstream repo FrodeSolheim/fs-uae#339. I've enabled issue tracking on my fork now, so you can open it there or just continue the discussion here.

Opps. Happy to continue here as if users see the issue they are more likely to look here.

grahambates commented 7 months ago

I'm going to try to get hold of a machine running Sonoma. Unfortunately my Mac is a work machine with OS updates locked down so I can't go past 13.6.1.

terriblefire commented 7 months ago

if you have issues getting Sonoma i can do a zoom call and demonstrate the issue at some point 

grahambates commented 7 months ago

Thanks, I might take you up on that. I'll let you know.

terriblefire commented 7 months ago

Digging a bit more...

psynch_cvwait(0x60000185AB50, 0x8D6601008D6700, 0x0) = -1 Err#316 psynch_cvwait(0x7FCCEEA2FB78, 0x1D4101001D4200, 0x1D4100) = 0 0

I get a bunch of these when it hangs. I suspect a deadlock of some kind.

terriblefire commented 7 months ago

@grahambates Ok i cannot yet tell you why this is happening but i know what is happening..

Basically after a lot of tracing i found that we're getting stuck in an infinite loop here..

https://github.com/grahambates/fs-uae/blob/remote_debugger_barto/src/vm.cpp#L180

Log except...

[UAE] [000] VM: trying 0x1d62ac000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62ad000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62ae000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62af000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b0000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b1000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b2000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b3000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b4000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b5000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b6000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b7000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b8000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62b9000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62ba000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62bb000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62bc000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62bd000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62be000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62bf000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c0000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c1000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c2000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c3000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c4000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c5000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c6000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c7000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c8000 step is 0x1000 = 0x25034e000 [UAE] [000] VM: trying 0x1d62c9000 step is 0x1000 = 0x25034e000

terriblefire commented 7 months ago

@grahambates i've figured it out. Your code assumes a 32bit address space. If macos allocates an address above this area your code discards the address.

else if (((uintptr_t) address) + size > (uintptr_t) 0xffffffff) { munmap(address, size); address = NULL; }

You need a better strategy here if you need an address below 4G.

grahambates commented 7 months ago

Awesome, thanks so much for getting to the bottom of this. I'll make a start on a fix.

terriblefire commented 7 months ago

Not gonna lie that was a nasty one to find.

terriblefire commented 7 months ago

DIgging a bit more I suspect -no-pie is probably no longer working in Sonoma. Do not know that for sure though its a guess.

grahambates commented 7 months ago

Yeah I think it must be something like that, because I hadn't actually touched that part of the code. It's present in FS-UAE since v2.7.

If the official version works fine then it could be my build setup. I did have to use -std=c++17 in order to include Bartman's debugger code, and there's a chance the official build uses GCC rather than Clang.

terriblefire commented 7 months ago

I doubt this its your direct build setup because if i switch between master and your branch i see the issue come and go.

I can see that you have made changes to the vm.cpp file. Is there any functional change there

vm.patch

However if i just patch this into your branch it has the same issue. So perhaps some other part of your build has changed.

grahambates commented 7 months ago

The changes will be upstream stuff from WinUAE. I basically just took a diff of WinUAE 4.2 to current and applied it to FS-UAE, manually fixing problems as I went. Tbh I wish I'd done this incrementally through minor versions, rather than as a single massive patch as it might be easier to track down where problems have been introduced. I did start redoing it that way and I think this would be worthwhile to complete.

grahambates commented 7 months ago

Just trying to get my head around the logic:

My understanding is that it only follows this code path if the UAE_VM_32BIT flag is set, to deliberately prevent the VM from allocating memory outside of a 32 bit address space.

It looks like this flag should be getting unset in mman.cpp when not using the Jit compiler (which is off by default), so I'm not too sure why it's set.

    if (!g_fs_uae_jit_compiler) {
        /* Not using the JIT compiler, so we do not need "32-bit memory". */
        vm_flags &= ~UAE_VM_32BIT;
    }

It keeps trying to allocate until it gets an address within range. One thing I do notice is the code is trying to prevent an infinite loop:

        uae_u8 *p_end = natmem_reserved - size;
// ...
        int step = uae_vm_page_size();
// ...
        while (address == NULL) {
            if (p > p_end) {
                break;
            }
// ...
            p += step;
        }

I wonder why this isn't breaking...

grahambates commented 7 months ago

It's not breaking because natmem_reserved is zero...

terriblefire commented 7 months ago

when mmap() allocates an address above 4Gb, the code unmaps it. the next mmap() just gets the same address or higher... it hangs forever.

grahambates commented 7 months ago

Yeah I can see that it will never succeed. The code seems to be designed to break after a specified number of steps though, but that behaviour is broken because natmem_reserved is zero and therefore p_end is negative. This is the case on master too, so this isn't what's changed.

terriblefire commented 7 months ago

looking at your branch only win32 code sets natmem_reserved. Maybe a hangover from WinUAE?

grahambates commented 7 months ago

Yeah I'm pretty sure the natmem_reserved thing is an existing bug in FS-UAE, and is just causing this to fail more spectacularly than it would otherwise when the allocation inevitably fails.

I'm more confused how the 32 bit allocation could ever work at all, than why it doesn't work on my fork!

From what I can see, when JIT is not enabled, the only place that allocates with the UAE_VM_32BIT flag is set is init_fpucw_x87(). I'm sure if/why it's needed, but the emulator still runs if I remove the flag. I could just be that I'm getting lucky with where it get allocated though! I'm wondering if this could be a quick fix to just disable it.

Do you think you could give this a try in src/od-win32/fpp_native_msvc_80bit.cpp?

    x87_fldcw_code = (uae_u8 *)uae_vm_alloc(uae_vm_page_size(), 0, UAE_VM_READ_WRITE_EXECUTE);

Be aware that some files in od-win32 are used on all platforms because... who knows why?!

terriblefire commented 7 months ago

Your patch didnt help but i've since determined that a call to alloc vm memory happens in you branch before preinit_shm() is called (and therefore natmem_reserved is set). I think this is the issue.

Yes indeed. The master branch calls preinit_shm (); in real_main2 and sets up these variables. This is the root cause.

Update: Simply putting preinit_shm() back causes a hard crash. So i guess this was removed for this reason without all the consequences being explored?

grahambates commented 7 months ago

Yeah either that or it got lost resolving a conflict in the WinUAE patch where there's FSUAE specific code. In current WinUAE the call to preinit_shm in main is commented out, and there's a different call in od-win32/win32.cpp.

terriblefire commented 7 months ago

At this point you should be able to assert that natmem_reserved != NULL in vm.cpp as a means to test if this is working or not.

grahambates commented 7 months ago

Can you let me know if this works for you? patch.txt

terriblefire commented 7 months ago

This fixes the hang in the sense the application exits in the cases where it would have hung before.

Whereas the standard FS boots every time.

EDIT: Also turning on console logging with this patch causes a hard crash.

Ah the crash comes from turning on "TRACK_ALLOCATIONS".

grahambates commented 7 months ago

Right, so we've fixed a problem with the error handling, but not the prevented the error. It's progress I guess!

terriblefire commented 7 months ago

i actually think this is dumb luck that standard FSUAE doesnt hit the problem. The first allocation size it makes is 4096, yours is 256. This code is pretty broken IMHO in that it will always get the same address if it hits an issue.

grahambates commented 7 months ago

Yeah it seems to be relying on the fact that setting the addr arg on mmap will result in a low address, but there's really no guarantee of this from what I can see:

If addr is not NULL, then the kernel takes it as a hint about where to place the mapping; on Linux, the kernel will pick a nearby page boundary (but always above or equal to the value specified by /proc/sys/vm/mmap_min_addr) and attempt to create the mapping there. If another mapping already exists there, the kernel picks a new address that may or may not depend on the hint.

grahambates commented 7 months ago

This is interesting https://stackoverflow.com/questions/67085329/allocating-on-a-32-bit-address-in-macos

There is a path in the code for native support for this:

#ifdef HAVE_MAP_32BIT
        address = mmap(0, size, mmap_prot, mmap_flags | MAP_32BIT, -1, 0);
        if (address == MAP_FAILED) {
            address = NULL;
        }
#endif

Perhaps the solution is just to set HAVE_MAP_32BIT on Mac if it's now supported. I'll give this a try.

terriblefire commented 7 months ago

I'm a bit confused why we need an address below 4Gb at all, its not like our pointers are 32bit? i just brain damaged that code out and things worked so far.

grahambates commented 7 months ago

Yeah I'm not sure either and that's certainly one option.

Setting HAVE_MAP_32BIT at least doesn't explode on my machine, so it looks like this is worth a try too.

terriblefire commented 7 months ago

it just gives me -1 back with that bit set. I started to see errors with out the 64bit check. What confuses me so much is why fs-uae standard manages to get away with this.

terriblefire commented 7 months ago

Ok perhaps this might be useful.

I have been comparing FS-UAE master vs your branch. In your branch JIT_EXCEPTION_HANDLER is defined. Whereas it is not in master. Practically this means the exception handler allocates VM RAM in your branch and it doesnt in master.

Basically for whatever reason your branch (be it due to libraries etc) is running out of 32bit memory space and when that happens FS-UAE cannot continue. It might be worth considering putting any allocations you can after the emulator is running.

terriblefire commented 7 months ago

For me if i add the flag MAP_FIXED to the mmap call it works. This means the (input) address is not taken as a hint but an absolute.

I've no idea about the consequences of that.

EDIT: Been testing this for an hour. Its working perfectly for me.

grahambates commented 7 months ago

That's great news. I can also see why the JIT exception handler is getting installed by default. The following had been lost in compemu_support.cpp in my branch:

void build_comp(void)
{
#ifdef FSUAE
    if (!g_fs_uae_jit_compiler) {
        jit_log("JIT: JIT compiler is not enabled");
        return;
    }
#endif

I'll add it back in.

Re MAP_FIXED, I wonder whether this needs to be for Mac only, or if it's good for Linux too.

terriblefire commented 7 months ago

Linux manpage says MAP_FIXED should behave the same. TBH its probably the safest option here because if you do end up out of the range of native mem you want to fail anyways.

When you dont have MAP_FIXED you just always get the same location back.

terriblefire commented 7 months ago

looks like this is fixed now. thanks.

grahambates commented 7 months ago

Thanks so much for you help on this. I'll test the Linux build and open a PR on this project if everything looks good.

I did notice that JIT in general is not working on my branch, but that's a separate issue. I'd imagine not many people will be enabling that for dev anyway because of its effect on accuracy, but I should fix it anyway.

terriblefire commented 7 months ago

Not a problem. Only remaining thing i'd like to look at is performance when debugging. Perhaps JIT is related to this ?

BartmanAbyss commented 5 months ago

Closing this. If JIT is still a problem, please open a new issue :)