QW-Group / ezquake-source

main ezQuake source code base
https://www.ezquake.com/
GNU General Public License v2.0
269 stars 122 forks source link

BUG: ezQuake locks up when precaching sound file that used to work fine #738

Closed Pulseczar1 closed 1 year ago

Pulseczar1 commented 1 year ago

ezQuake version: 3.6.1 06270807a03fec68bb5adb6f5a72a4d352a787eb

OS/device including version: Ubuntu 20.04.5 LTS

Describe the bug The problem that was originally noticed is that this version of ezQuake immediately freezes when joining some Team Fortress servers.

To Reproduce Steps to reproduce the behavior:

  1. Join a CustomTF server. Here's one: gamehost2.tastyspleen.net:27507
  2. ezQuake locks up right after connecting to the server.

Additional context ezQuake, version 3.6-dev-alpha10-dev (Linux, x86_64), connecting to the same server connects and plays fine. I'm going to write about my own investigation of the problem here.

Here's the call stack while in the lockup (infinite loop):

(gdb) bt
#0  0x00005555555ab560 in BuffLittleLong (buffer=0x7fffffff638c "") at q_shared.c:812
#1  0x0000555555700319 in S_ParseCueMark (chunk=0x7fffffff6374 "ltxt\024", len=48, cue_point_id=0, sample_length=0x7fffffff67b0) at snd_mem.c:675
#2  0x0000555555700534 in S_FindCuePointSampleLength (sndfile=0x555560a51b40, cue_point_id=0, sample_length=0x7fffffff67b0) at snd_mem.c:715
#3  0x00005555557007e1 in S_LoadSound (s=0x55555ec78720) at snd_mem.c:780
#4  0x00005555556fc0d3 in S_PrecacheSound (name=0x55555926b30c <cl+1456332> "environ/rumble.wav") at snd_main.c:555
#5  0x000055555568cde6 in Sound_NextDownload () at cl_parse.c:790
#6  0x000055555568d536 in CL_RequestNextDownload () at cl_parse.c:1051
#7  0x000055555568d636 in CL_FinishDownload () at cl_parse.c:1087
#8  0x000055555568cfb8 in CL_SendChunkDownloadReq () at cl_parse.c:868
#9  0x0000555555684a69 in CL_SendCmd () at cl_input.c:1006
#10 0x000055555568838f in CL_SendToServer () at cl_main.c:1661
#11 0x000055555568a0e4 in CL_Frame (time=0.016979640000002405) at cl_main.c:2497
#12 0x00005555555a0dae in Host_Frame (time=0.016979640000002405) at host.c:479
#13 0x00005555557b42c3 in main (argc=1, argv=0x7fffffffde08) at sys_posix.c:347

You can see that it crashes when precaching the sound file, environ/rumble.wav.

While watching ezQuake run with the debugger, I found that, in S_FindCuePointSampleLength(), it looks for adtl in the file. It finds it, and therefore, it calls S_ParseCueMark(). Here's that function:

static qbool S_ParseCueMark(const byte* chunk, int len, int cue_point_id, int* sample_length)
{
    int pos = 0;

    *sample_length = 0;
    while (pos < len - 8) {
        unsigned int size = BuffLittleLong(chunk + pos + 4);

        // Looking for ltxt chunk with purpose "mark"
        if (size >= 20 && !strncmp(chunk + pos, "ltxt", 4) && !strncmp(chunk + pos + 16, "mark", 4)) {
            // Might be for a different cue point
            if (cue_point_id == BuffLittleLong(chunk + pos + 8)) {
                *sample_length = BuffLittleLong(chunk + pos + 12);
                return true;
            }
        }
        pos += size;
    }
    return false;
}

S_ParseCueMark() searches right after adtl for the text, ltxt. It finds ltxt. So it searches 16 bytes down from ltxt for mark. However, instead, it finds rgn. The while loop in S_ParseCueMark() is the loop it gets hung in. It soon reads a size of 0, while it's searching for ltxt, and you can see how that would keep it in the loop forever.

I'm not sure what the problem is. Since this sound file likely hasn't ever changed and works with previous versions of ezQuake, I think it's probably a problem with the commit that added this code, where the infinite loop occurs: 95f4f98203be115994ef82fe12c8eb1cd6a31a5e . My guess is that the code doesn't properly handle all possible RIFF Wave format configurations, like files where rgn exists in place of mark. I stopped investigating, for now, at this point. My next step would be to look up the RIFF Wave format specification and see whether this code handles all possible configurations of a RIFF Wave file.

Here is what the end of rumble.wav looks like:

0000:CF20 | 00 00 00 00  00 00 00 00  00 00 4C 49  53 54 56 00 | ..........LISTV.
0000:CF30 | 00 00 49 4E  46 4F 49 43  52 44 0B 00  00 00 31 39 | ..INFOICRD....19
0000:CF40 | 39 36 2D 31  32 2D 31 36  00 00 49 45  4E 47 0D 00 | 96-12-16..IENG..
0000:CF50 | 00 00 4A 61  6D 65 73 20  47 72 75 6E  6B 65 00 00 | ..James Grunke..
0000:CF60 | 49 53 46 54  20 00 00 00  53 6F 75 6E  64 20 46 6F | ISFT ...Sound Fo
0000:CF70 | 72 67 65 20  34 2E 30 3B  53 6F 75 6E  64 20 46 6F | rge 4.0;Sound Fo
0000:CF80 | 72 67 65 20  34 2E 35 00  63 75 65 20  1C 00 00 00 | rge 4.5.cue ....
0000:CF90 | 01 00 00 00  01 00 00 00  00 00 00 00  64 61 74 61 | ............data
0000:CFA0 | 00 00 00 00  00 00 00 00  00 00 00 00  4C 49 53 54 | ............LIST
0000:CFB0 | 34 00 00 00  61 64 74 6C  6C 74 78 74  14 00 00 00 | 4...adtlltxt....
0000:CFC0 | 01 00 00 00  D1 CE 00 00  72 67 6E 20  00 00 00 00 | ....ÑÎ..rgn ....
0000:CFD0 | 00 00 00 00  6C 61 62 6C  0C 00 00 00  01 00 00 00 | ....labl........
0000:CFE0 | 4D 41 52 4B  38 38 31 00                           | MARK881.        

I've also attached rumble.wav. GitHub wouldn't accept a wav file. So, I zipped it. --> rumble.zip

ciscon commented 1 year ago

this has been fixed recently, please try the nightly build and see if the issue has been resolved for you: https://builds.quakeworld.nu/ezquake/snapshots/latest/x64/

Pulseczar1 commented 1 year ago

Oh, I see: https://github.com/QW-Group/ezquake-source/pull/723 I searched through Issues before posting this, but didn't think to search through Closed Issues, Pull Requests, or Commits. I compiled the latest commit in Master: dbe6b521046605c2c109220d3df8101d8adde10f I got this result on the first run:

[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
ezquake-linux-x86_64-new-dev-version: ../../src/xcb_io.c:260: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
Received signal 6, exiting...
DOUBLE SIGNAL FAULT: Received signal 11, exiting...

However, after running it a second time, it ran properly. I have a similar issue with another version of ezQuake I've used a good bit: 3.6-dev-alpha10-dev, Linux x86_64. It will crash and give that message, or very similar, about 1 in 4 or 1 in 5 times, at startup.

Once I tried connecting to the same server, as before, I was able to connect properly and it seems to be working fine. This issue seems to have been resolved, like you said. I see that I have the option to close this Issue. I'll let one of you on the team make that determination.

ciscon commented 1 year ago

i've seen that before on newer versions of mesa, when mesa_glthread isn't disabled (you want to do that anyway as it's faster for us), though i'm pretty sure it was fixed somewhere in 22.x

tcsabina commented 1 year ago

Closing, as this is fixed in PR #723