buu342 / N64-UNFLoader

A cross-platform and universal Nintendo64 flashcart ROM uploader, featuring a USB+debug library for libultra and libdragon.
Do What The F*ck You Want To Public License
91 stars 27 forks source link

Parsing commands leads to system crash when compiled with debug mode disabled #68

Closed tristanpenman closed 1 year ago

tristanpenman commented 3 years ago

This is a continuation of the discussion from #66, so that it isn't lost in a closed PR.

I was thinking about the issue, and figured it would be easy to modify one of the examples to capture the spirit of what I'm trying to do with my code. That is to compile using release libraries and optimisation enabled, while also retaining the ability to communicate with the ROM using UNFLoader's command parser.

I was able to reproduce the issue with example 4, using Crash's SDK: https://github.com/tristanpenman/N64-UNFLoader/commit/9dd913742ba259cf7f23184a13b194ae3002327d

This commit sets DEBUG_MODE = 0 in GNUMakefile, but still includes the code from debug.c in the final build. debug.h has been modified to set OVERWRITE_OSPRINT = 0, since the required symbols are not included in the release libraries.

When running this ROM on an Everdrive X7, sending the rotate command is enough to crash the system.

This seems like a good starting point for continuing to investigate the issue, and I'll continue digging into it when time allows :sweat_smile:

At this stage, I'm not even sure it is safe to assume that this is a memory alignment issue. So @rasky and @buu342, if you have any ideas that are worth exploring, let me know and I'll try to carve away some time to try them out.

buu342 commented 3 years ago

Ok interesting. This could very well be a problem with the command parser, which wouldn't surprise me given that I personally find that I over complicated the parsing loop in debug_thread_usb, but especially the token splitter debug_commands_setup. When I get some free time on Friday, I will see if I'm able to reproduce the issue on the XP SDK (on a 64Drive), and if not, I'll try again on Crash's SDK using Linux. I haven't given this library a proper testing with optimizations enabled, given that it's meant to be used for debugging :sweat_smile:

Does the crash cause it to print the crash data through USB? It wouldn't surprise me if it doesn't (because it can't output if there's USB data to be serviced first), but it never hurts to ask!

tristanpenman commented 3 years ago

When it crashes, there is no debug output through USB.

Here's what it looks like in my terminal when it crashes:

UNFLoader
--------------------------------------------
Cobbled together by Buu342
Compiled on Jul 20 2021

Debug mode enabled.
EverDrive autodetected!
USB connection opened.
ROM successfully uploaded in 0.06 seconds!
Debug mode started. Press ESC to stop.

Debug mode initialized!

Available USB commands
----------------------
1. ping
        Pong!
2. rotate
        Toggles square rotation
3. color R G B
        Changes the background color
4. texture @file@
        Changes the square texture

Uploading data [################] 0%
Error: Everdrive timed out.

USB connection closed.
Press any key to continue...
tristanpenman commented 3 years ago

I spent more time working on this (more than I should have!) and found something interesting. After reading through the assembly that GCC was generating, I noticed that usb_everdrive_readreg and usb_everdrive_readdata were being inlined. When I added __attribute__((noinline)) to usb_everdrive_readreg, that resolved the issue.

I was able to narrow this down to a particular inlining scenario, by first creating a no-inline copy of usb_everdrive_readdata, and then creating a copy of usb_everdrive_readreg that calls that.

Then through trial and error, I found that usb_everdrive_usbbusy is where the inlining causes problems. I don't think usb_everdrive_usbbusy is getting inlined, but investigating that may be the next step...

For reference, here is the commit with the relevant changes: https://github.com/tristanpenman/N64-UNFLoader/commit/27c4e6fc2f4e87fe186e56ac10bd367855716315

buu342 commented 3 years ago

This is the first time I've heard of inline causing code to break :thinking: I'll take a closer look on Friday, testing the library with optimizations on my 64Drive to confirm whether or not it really is something in the ED code, or my shoddy command parser.

tristanpenman commented 3 years ago

It's very unusual. This is probably more of what @rasky suggested, where the 'fix' happens to move things around in just the right way to make it work again. Or even more nefarious, the fix changes timing enough to stop triggering the bug. It is interesting to note that when I used memset to clear buff at the beginning of usb_everdrive_poll, that seemed to fix the issue, but introduced a lot of slow-down.

buu342 commented 3 years ago

That seems like a more likely fix, because most compiler optimization problems are due to the debug versions automatically initializing variables to 0 (which the optimized versions don't), or the compiler thinking certain variables are unnecessary and optimizing them away.

rasky commented 3 years ago

Can you add some asserts / checks that whatever gets to usb_everdrive_readdata is 8-byte aligned, and that the PI address and length are 2-bytes aligned?

tristanpenman commented 3 years ago

That's a great idea. I'll try that over the weekend.

tristanpenman commented 3 years ago

Here's my progress on this...

@rasky I tried your assertion idea, and the addresses were all aligned as expected. That pretty much ruled out any issues related to address alignment.

@buu342 your comment about optimising variables away was what I investigated next. Although inlining itself shouldn't cause issues, there is a case here where it is pretty clear that the compiler could optimise out the u8 value in usb_everdrive_readreg, or at least determine that it's value does not change. So I tried making it volatile and that has worked in all of the tests that I've done. That includes forcing both usb_everdrive_readreg and usb_everdrive_readdata to be inlined, so this feels pretty robust.

Here's the diff: https://github.com/tristanpenman/N64-UNFLoader/compare/master...tristanpenman:possible-alignment-issue?expand=1

Because adding volatile everywhere that usb_everdrive_readreg looked messy, I changed the function to return the value directly, instead of taking a pointer as an argument. It works without those changes though, so don't mind changing it back if you'd prefer to keep the existing style.

If that all looks good to you, I'll raise a PR :smile:

rasky commented 3 years ago

Great progress!

I don't see any reason why the compiler would semantically need that those variables be marked as volatile. Their address is taken and passed to the readreg function so the compiler must keep them stable on the stack and must reload them after the function call.

I don't want to stress you and I understand if you think it's enough so just let me know in that case.

Can you please post a disassembly of both the standard (buggy) version and the version fixed by just adding volatile (without stack changes)?

It might be that you don't need to declare all the variables as volatile but just cast them to vu32* when calling readreg.

tristanpenman commented 3 years ago

No stress at all. This whole investigation has been a good refresher on C knowledge that I haven't used in a long long time.

I think the other possibility is that rather than being 'optimised out', the compiler is deciding that it can use a copy of the u32 value held in a register, or something like that. My understanding of volatile is that it should generally be used when a value can be modified out-of-process or by another thread, as it prevents this optimisation from being applied.

Given that the only case where I saw inlining cause an issue was in usb_everdrive_usbbusy, it's not difficult to imagine the compiler inlining both usb_everdrive_readreg and usb_everdrive_readdata, then deciding that u32 val can be held in a register, since it is passed as a void pointer to osPiDmaStart.

That said, I'm not keen to make any assumptions about what the compiler is actually doing at this stage, so I'll try to get some disassemblies when I next have a chance to look at this.

Edit: meant to write u32 instead of u8.

tristanpenman commented 3 years ago

For reference, here are the volatile changes, without any further stack changes: https://github.com/tristanpenman/N64-UNFLoader/compare/master...tristanpenman:possible-alignment-issue-2?expand=1

I'll let you know when I have a chance to get the disassembled versions.

rasky commented 3 years ago

Which is the u8 variable you are speaking of? I don't seem to see a u8 in the affected functions.

If you want to upload the elf binaries I can disassemble them myself. I'm sorry to ask but I develop libdragon and I don't want to have and use Nintendo's copyrighted code to avoid issues so I can't recompile it myself, though disassembling is fine.

buu342 commented 3 years ago

I finally got the chance to test out sample 4 with a 64Drive, and I can confirm that the command parser does work on the 64Drive with -O3, except for the texture command. Possibly something in the extra logic is breaking when it comes to the file parsing code. So yeah, the issue at hand is specific to the EverDrive.

Although I compiled with optimizations, I had to link the libultra debug libraries because debug_printf relies on libultra's __printfunc function, which is only available in the debug libraries.

Regardless, I do think this is a good opportunity for me to clean up the logic for the command parsing functions...

tristanpenman commented 3 years ago

The reference to u8 was a mistake on my part. I updated my comment to correct that.

Here are the two builds you asked for. The first is without any changes to the usb_everdrive_* functions, and with DEBUG_MODE = 0 set in the Makefile: before.zip

The second is with the changes from my possible-alignment-issue-2 branch: after.zip

Both builds were done with the -g flag, so hopefully the debugging symbols will make it easier to disassemble.

Out of curiosity, what tools do you use for disassembling object files or ROMs like this? For my earlier investigation I was just using the output of gcc -S, but there must be better ways.

And finally, thanks for taking the time to help understand (and hopefully resolve) this issue

tristanpenman commented 3 years ago

I should also mention that those ROMs are built using the libultra release libraries. I set OVERWRITE_OSPRINT to 0 in debug.h to resolve the issue with __printfunc not being included in those libraries.

buu342 commented 3 years ago

Oh yes, I am very silly... __printfunc is the osSyncPrintf function pointer, _Printf is the one that handles the formatting part of printf :stuck_out_tongue:

I'll redo my test first thing tomorrow morning. I also intend on building a copy of the sample that outputs the crash data to the TV screen. I think that ought to be a bit more useful to figure out what's going on, assuming that it's not the flashcart that's locking up because it was fed some bad address or something. That would be harder to diagnose...


On the Windows XP SDK, you can disassemble the ROM by feeding the .out of the ROM to the objdump tool.

buu342 commented 3 years ago

@tristanpenman Alright, I tried with the release libraries with -O3, and exact same results on the 64Drive.

Here's my ROM setup, along with the modifications that prints crash information to the screen instead of through USB. You can test the crash printing by sending the command crash through USB. This should also work for debug_assert, which uses the exact same crash printing function.

A compiled ROM is available inside the out folder if you don't want to compile it yourself :grimacing:

4commands.zip

tristanpenman commented 3 years ago

@buu342 here's a summary of what I found while testing your updated code.

When tested with the Windows SDK (running through Wine), compiled with -O2:

When tested with the modern SDK, also compiled with '-O2' and with my changes removed:

Interestingly, neither case lead to the fault thread showing that an error has occurred, so it is possible that it was simply locking up. To test this, I tried changing global_rotate to TRUE by defualt. My thinking is that running a command would cause rotation to stop if the system froze.

What I found is that rotation continues, so the system is not actually crashing here. I assume this means the USB thread is getting stuck, which would be consistent with what I found, where code such as this:

static void usb_everdrive_usbbusy() 
{
    u32 val __attribute__((aligned(8)));
    do 
    {
        usb_everdrive_readreg(ED_REG_USBCFG, &val);
    } 
    while ((val & ED_USBSTAT_ACT) != 0);
}

could end up in an infinite loop if val is not refreshed due to compiler optimisations. Actually proving this will take me a while, since my knowledge of MIPS assembly is pretty much non-existent.

So other than that, I'm not really sure how to proceed. The volatile changes have worked well in my own code, and appear to cause no harm, so I'm happy to keep using them myself. And when time allows, I can try to dig deeper.

Let me know if you have any ideas on what to try next.

buu342 commented 3 years ago

The N64 OS uses preemptive scheduling, meaning that threads have to manually yield to allow other threads to execute. This implies that, if the USB thread gets stuck in a loop, then it will be unable to yield to other threads, thus the game should freeze. If it's not freezing for you, then that means that it's not actually getting stuck in that loop and, instead, it's the flashcart's USB system that's locking up (likely, it ended up getting fed some invalid value).

I think my thought process is correct... But just to make sure, you can try sticking a

while (1)
    ;

at the end of a function such as usb_everdrive_poll (which will probably throw a warning, as the function will never return). It should cause the rotation to freeze. At least, this is what happens on my 64Drive.

tristanpenman commented 3 years ago

Do you mean cooperative multi-tasking? This is definitely something I'll have to explore.

I'm also working through some MIPS assembly tutorial videos. When this issue is eventually fixed, I'd very much like to know why the fix works, and it seems like that lower level knowledge will be required :)

buu342 commented 3 years ago

Er right, I did just describe cooperative multi-tasking, heh :sweat_smile:

In the N64's case, the RCP can generate interrupts/events when it finishes processing audio or graphics tasks, which can cause the CPU to yield if the threads are higher priority. Hence why it's a preemptive system.

The USB+Debug library, however, I gave them the highest priority possible to prevent possible issues with DMA'ing. Since this is a debug library in the end, it doesn't matter if it ends up locking the system for a little bit when servicing the USB.

buu342 commented 3 years ago

Well, I finally had the patience time to check out that issue regarding the 4th libultra sample breaking when -O3 is enabled. After about 2 hours diagnosing, I figured it out...

It's because I need to increase the 64Drive poll time (that macro in line 96 of usb.c) :weary:

So the library seems to be working fine with optimizations on the 64Drive. The poll time thing is something I've been wanting to look into and fix for a while (https://github.com/buu342/N64-UNFLoader/issues/51) but I need to have a proper chat with Marshall.

buu342 commented 1 year ago

@tristanpenman Heh, it's been a while. A bunch of changes are coming very soon. Of note, PI code has been cleaned up, ED code has been tweaked, and 64Drive polling has been completely fixed (turns out there was an oversight from my part this entire time). It might be very likely that the bug described in this issue is gonna be fixed, so once it merges, I'll probably close it. If this problem returns, feel free to make another or re-open this one!

tristanpenman commented 1 year ago

No worries 😄 I'll keep an eye out for the changes, and give it a go with my old homebrew project.