JHGuitarFreak / UQM-MegaMod

A fork of The Ur-Quan Masters + HD-mod that remasters the HD graphics with a veritable smorgasbord of extra features, options, QoL improvements, and much more...
https://uqm-mods.sourceforge.net
GNU General Public License v2.0
78 stars 22 forks source link

Segfault when trying to buy technology out of order in hard mode. #81

Closed onpon4 closed 2 years ago

onpon4 commented 3 years ago

I decided to give hard mode a try, and the first Melnorme technology I tried to buy was the one on Zeeman. It seems that attempting to purchase technology out of order like this causes a segmentation fault.

Here's a backtrace:

Thread 6 "Starcon2Main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffda79d700 (LWP 12952)]
0x00005555556a5782 in DoBuy (R=195) at src/uqm/comm/melnorm/melnorm.c:1315
1315                nextTech->price += countTech() * 50;
(gdb) bt full
#0  0x00005555556a5782 in DoBuy (R=195) at src/uqm/comm/melnorm/melnorm.c:1315
        nextTech = 0x555555752140 <tech_sale_catalog+64>
        credit = 1248
        needed_credit = 0
        slot = 255 '\377'
        capacity = 16000
        FuelCost = 10 '\n'
#1  0x0000555555646571 in SelectResponse (pES=0x7fffda79c6b0)
    at src/uqm/comm.c:1143
        response_text = 0x7fffda79c6d0
#2  0x000055555564666c in PlayerResponseInput (pES=0x7fffda79c6b0)
    at src/uqm/comm.c:1187
        response = 0 '\000'
#3  0x0000555555646a28 in DoCommunication (pES=0x7fffda79c6b0)
    at src/uqm/comm.c:1313
No locals.
#4  0x0000555555656a93 in DoInput (pInputState=0x7fffda79c6b0, 
    resetInput=FALSE) at src/uqm/gameinp.c:406
No locals.
#5  0x0000555555647146 in HailAlien () at src/uqm/comm.c:1513
        ES = {InputFunc = 0x55555564697b <DoCommunication>, Initialized = 1, 
          NextTime = 73816, num_responses = 0 '\000', cur_response = 0 '\000', 
          top_response = 0 '\000', response_list = {{response_ref = 195, 
--Type <RET> for more, q to quit, c to continue without paging--c
              response_text = {baseline = {x = 36, y = 545}, pStr = 0x7fff91fddb60 "I would like to buy the technology you described.", align = ALIGN_LEFT, CharCount = 65535}, response_func = 0x5555556a535a <DoBuy>, allocedResponse = 0x0}, {response_ref = 196, response_text = {baseline = {x = 36, y = 582}, pStr = 0x7fff92af8540 "I do not want to purchase any more technology.", align = ALIGN_LEFT, CharCount = 65535}, response_func = 0x5555556a535a <DoBuy>, allocedResponse = 0x0}, {response_ref = 150, response_text = {baseline = {x = 36, y = 619}, pStr = 0x7fff91e8c730 "I wish to buy information.", align = ALIGN_LEFT, CharCount = 65535}, response_func = 0x5555556a535a <DoBuy>, allocedResponse = 0x0}, {response_ref = 134, response_text = {baseline = {x = 36, y = 656}, pStr = 0x7fff91e8c590 "I am done buying, for now.", align = ALIGN_LEFT, CharCount = 65535}, response_func = 0x5555556a5d8d <NatureOfConversation>, allocedResponse = 0x0}, {response_ref = 135, response_text = {baseline = {x = 36, y = 693}, pStr = 0x7fffd05275a0 "I shall be leaving now.", align = ALIGN_LEFT, CharCount = 65535}, response_func = 0x5555556a4d9c <ExitConversation>, allocedResponse = 0x0}, {response_ref = 0, response_text = {baseline = {x = 0, y = 0}, pStr = 0x0, align = ALIGN_LEFT, CharCount = 0}, response_func = 0x0, allocedResponse = 0x0}, {response_ref = 0, response_text = {baseline = {x = 0, y = 0}, pStr = 0x0, align = ALIGN_LEFT, CharCount = 0}, response_func = 0x0, allocedResponse = 0x0}, {response_ref = 0, response_text = {baseline = {x = 0, y = 0}, pStr = 0x0, align = ALIGN_LEFT, CharCount = 0}, response_func = 0x0, allocedResponse = 0x0}}, phrase_buf = "I would like to buy the technology you described.", '\000' <repeats 974 times>}
        PlayerFont = 0x7fff881e79e0
        OldFont = 0x7fffd0508f00
        SongRef = 0x0
        TextBack = {r = 0 '\000', g = 0 '\000', b = 8 '\b', a = 255 '\377'}
#6  0x0000555555647534 in InitCommunication (which_comm=MELNORME_CONVERSATION) at src/uqm/comm.c:1650
        status = 0
        LocDataPtr = 0x55555591a4c0 <melnorme_desc>
#7  0x000055555564798f in RaceCommunication () at src/uqm/comm.c:1782
        i = 12
        status = 0
        hStarShip = 0x7fffd019d450
        FragPtr = 0x7fffd019d450
        hEncounter = 0x0
        RaceComm = {ARILOU_CONVERSATION, CHMMR_CONVERSATION, INVALID_CONVERSATION, ORZ_CONVERSATION, PKUNK_CONVERSATION, SHOFIXTI_CONVERSATION, SPATHI_CONVERSATION, SUPOX_CONVERSATION, THRADD_CONVERSATION, UTWIG_CONVERSATION, VUX_CONVERSATION, YEHAT_CONVERSATION, MELNORME_CONVERSATION, DRUUGE_CONVERSATION, ILWRATH_CONVERSATION, MYCON_CONVERSATION, SLYLANDRO_CONVERSATION, UMGAH_CONVERSATION, URQUAN_CONVERSATION, ZOQFOTPIK_CONVERSATION, INVALID_CONVERSATION, BLACKURQ_CONVERSATION, INVALID_CONVERSATION, INVALID_CONVERSATION, INVALID_CONVERSATION, YEHAT_REBEL_CONVERSATION, URQUAN_DRONE_CONVERSATION}
#8  0x00005555556918b4 in Starcon2Main (threadArg=0x0) at src/uqm/starcon.c:323
No locals.
#9  0x00005555555e1a63 in ThreadHelper (startInfo=0x5555567d1d00) at src/libs/threads/sdl/sdlthreads.c:216
        func = 0x555555691274 <Starcon2Main>
        data = 0x0
        sem = 0x5555567cc1b0
        thread = 0x5555567c1440
        result = 0
#10 0x00007ffff7d306df in ?? () from /lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#11 0x00007ffff7db6eb9 in ?? () from /lib/x86_64-linux-gnu/libSDL2-2.0.so.0
No symbol table info available.
#12 0x00007ffff7329ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736858806016, -5403718433099793068, 140737488346270, 140737488346271, 140736858804032, 8396800, 5403640348962143572, 5403700330837917012}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#13 0x00007ffff7bb5def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

Here's my save which is right next to the Zeeman Melnorme, in case it helps with debugging:

uqmsave.zip

🕵️

Serosis commented 3 years ago

Is this in v0.8.0.85 or v0.8.1?

And from the info it looks like you're running a Linux distro.
I can't currently test on Linux due to my own hardware failing me at the moment.

I know it doesn't help but this crash does not happen on Windows, even with your save.
Perhaps as an alternative run the Windows version through WINE for now.
I need to get my Virtual Machines back up and that may take awhile.

onpon4 commented 3 years ago

0.8.0.85. I didn't even know there was a 0.8.1 (0.8.0.85 is listed as the latest release on GitHub). And yes, I'm on a Linux distro.

Do you know how to read a backtrace? /gen The text I pasted in the first post is a backtrace from GNU Debugger. It shows the exact line where the segfault happened and some information about what data was involved at the time. Specifically it shows the segfault to be in src/uqm/comm/melnorm/melnorm.c, line 1315 (further indications below indicate from what context DoBuy was called when the segfault occurred). /nm /info

Even if this is not reproducible on Windows, it's a memory access bug. nextTech points to a memory location that it shouldn't (I'm sure that one's the problem because it's the only pointer being dereferenced on that exact line); it may simply be that due to architecture differences it's causing a segfault on Linux but just some sort of unnoticeable buggy behavior on Windows. (If I had to guess as to the cause, I would suppose that it's trying to index an element of an array outside of it, since nextTech is not NULL. Alternatively, if division by zero can possibly happen, that could cause this sort of problem if the division by zero ended up setting that pointer to a random address.) Of course, there's the possibility that this was fixed since the release I was running, too. /neu

🕵️

Serosis commented 3 years ago

v0.8.1 is in development and I'm just waiting on a few graphical additions before putting out the official release.

Do you know how to read a backtrace?

No, not really. I'm a modder, not a programmer.
If Visual Studio doesn't point at it with big angry words then I will never see it. Even everything you've said flies over my head and it sounds like you have a better understanding of what went wrong. Though I seem to remember the Melnorme hard mode code giving me issues before.

Once I get my Linux VM up and running I'll bang my head against it until it cooperates. Debugging without a big angry GUI is going to be an interesting learning experience.

onpon4 commented 3 years ago

That's fair enough, C is a tricky language to work with. Let me try to explain as best I can.

So, the basics: in C, a pointer is a memory address which (hopefully) references a particular location in RAM, allocated by the malloc function and later freed by the free function (or alternatively, allocated on the stack when you use a notation such as char s[256]). However, circumstances like incorrect memory allocation, corruption caused by division by zero, an uninitialized pointer, or simply a NULL pointer, can result in a pointer being for an arbitrary RAM address.

When a pointer is to an arbitrary RAM address (not an address which is specifically included in a malloc call for that pointer, or in the case of memory on the stack, an address included in what was allocated by the stack), and code attempts to actually dereference that pointer (usually either with the * operator, e.g. *ptr, or with the -> operator, e.g. ptr->x), one of two things can happen: the first thing that can happen is that the memory location being pointed to by the pointer is a valid address for the code to access, but just doesn't belong to the pointer. In this case, pretty much any random breakage can happen as you might, for example, be reading some random image data as a number.

The second thing that can happen is that the address pointed to by the pointer is an address the code is not allowed to access. In this case, a segmentation fault occurs. For example, NULL is typically defined as address 0x0, which is typically an address reserved by the kernel, so attempting to deference a pointer set to NULL will cause a segmentation fault.

Hopefully this is making sense so far?

So, the specific bug that we have. GDB explains that there was a segmentation fault with this specific line:

nextTech->price += countTech() * 50;

The only place here where a pointer is dereferenced is nextTech->price, where nextTech is being dereferenced (it's equivalent to (*nextTech).price). (countTech() can be excluded from analysis because if the illegal dereference happened there, then it would have reported the error as occurring within that function.) GDB reports that the value of nextTech on my particular system in that particular run was 0x555555752140. Exact address will vary between runs (the RAM being allocated the exact same way on the same section of RAM twice would be rather unlikely), but on Linux, a NULL pointer would be 0x0, so this tells us that it was some address other than NULL which led to the segfault.

This is the specific code in question with context:

        nextTech = GetNextTechForSale ();
        if (!nextTech)
        {
            NPCPhrase (NEW_TECH_ALL_GONE);
            goto BuyBuyBuy; // No tech left to buy
        }

        NPCPhrase (nextTech->sale_line);

        if (DIF_HARD && countTech() > 0)
        {
            nextTech->price += countTech() * 50;
            NPCPhrase (NEED_MORE_CREDIT0);
            NPCNumber (nextTech->price - TECHPRICE, NULL);
            NPCPhrase (NEED_MORE_CREDIT1);
        }

So the source of whatever value nextTech is gets assigned to it by the function GetNextTechForSale. We know that it's not NULL, so this is the specific code which is setting it:

    BYTE i = 0;
    BYTE j = 0;

    if (DIF_HARD && CurStarDescPtr)
    {
        switch (CurStarDescPtr->Index)
        {
            case MELNORME0_DEFINED:
                i = TECH_MODULE_CANNON;  
                j = i + 1;
                break;
            case MELNORME1_DEFINED:
                i = TECH_MODULE_BLASTER; 
                j = i + 1;
                break;
            case MELNORME2_DEFINED:
                i = TECH_LANDER_SHIELD_BIO; 
                j = i + 2; // TECH_LANDER_CARGO
                break;
            case MELNORME3_DEFINED:
                i = TECH_LANDER_RAPIDFIRE;
                j = i + 2; // TECH_LANDER_SHIELD_QUAKE
                break;
            case MELNORME4_DEFINED:
                i = TECH_MODULE_BIGFUELTANK;    
                j = i + 1; 
                break;
            case MELNORME5_DEFINED:
                i = TECH_LANDER_SPEED;
                j = i + 2; // TECH_MODULE_ANTIMISSILE
                break;
            case MELNORME6_DEFINED:
                i = TECH_MODULE_TRACKING;   
                j = i + 1; 
                break;
            case MELNORME7_DEFINED:
                i = TECH_LANDER_SHIELD_LIGHTNING;   
                j = i + 2; // TECH_LANDER_SHIELD_HEAT 
                break;
            case MELNORME8_DEFINED:
                i = TECH_MODULE_FURNACE;
                j = i + 1; 
                break;
            default:
                i = 0; j = i;
        }

        for (i = i; i < j; ++i)
        {
            if (!HasTech (tech_sale_catalog[i].techId))
                return &tech_sale_catalog[i];
        }
    } 

I don't know what part of this is leading to the segfault though. Memory access bugs in C are notoriously tricky.

🕵️

Serosis commented 3 years ago

I was finally able to test this out my Linux Mint VM and it seems to be fixed already.
I thought it sounded familiar and it turns out commit 49a09a2df9c5077779912913fff6de01d6fdff19, post-v0.8.0.85, fixed the issue.

@line 600 in src\uqm\comm\melnorm\melnrom.c simply remove static const from TechSaleData tech_sale_catalog[] and rebuild.

onpon4 commented 3 years ago

Good to hear! I think it's probable that only the const part is problematic. In any case, I'll apply that quick fix on my end. Thanks!

🕵️

onpon4 commented 3 years ago

Ok, so, that specific change doesn't fix the problem on my end, so I assume that's not exactly what fixed it, but if you've verified that the latest git revision doesn't have the problem I'm okay with that.

🕵️

Serosis commented 3 years ago

Perhaps do a git clone of the most current of both this and the content repository and test on your end and maybe we can work our way back?

If the problem really isn't fixed I'd still like it to be. Just in case.

If you're not playing in HD nothing really changes much except for a few odds and ends.
Though HD is radically changed to how it was before.

onpon4 commented 3 years ago

Oh, I very much am using HD. 😅

When I get the spoons for it I'll give the latest git a try and see if it works right.

🕵️

Serosis commented 3 years ago

If you like how the original is scaled, you'll like the latest revision.

With custom border enabled image

onpon4 commented 3 years ago

Oh! Yeah, that's real nice. I'd forgotten that the HD mod changed the proportions so drastically.

🕵️

Serosis commented 3 years ago

Yeah, HD-mod scaled the UI between 2x and 3x while the rest was scaled 4x. Made for some odd design.

JoshuaPettus commented 3 years ago

The new ship looks awesome! Was never really happy with the old HD design. This is one is a lot closer to the original and looks cooler to boot!

Serosis commented 3 years ago

You can thank Jordan Lo for the new ship and its accompanying modules.

He's a ship creating genius: https://www.deviantart.com/haomaic/gallery/all

Serosis commented 3 years ago

Hey onpon4, have you done any more testing on this?

onpon4 commented 3 years ago

Sorry, forgot to follow up.

So I just tried, and I can't run the latest git code at all. I get a segfault any time I try to do anything (and there's no sound, not that I care about that really). I assume that's because of content incompatibility? I have no idea how to get it to work, tried deleting the config directory, tried installing it to a special place on my home directory… only thing I didn't try is using newer content but I have no idea how to do that.

🕵️

Serosis commented 3 years ago

To get the new content you git pull (or .zip download) the UQM-MegaMod-Content repository and stick its files into the MegaMod's content directory.

onpon4 commented 2 years ago

Sorry I didn't get back to you on this sooner. I just tried with the latest release and the problem seems to be fixed.

🦇