SerenityOS / serenity

The Serenity Operating System 🐞
https://serenityos.org
BSD 2-Clause "Simplified" License
30.25k stars 3.17k forks source link

StackInfo issues on Alpine (musl-based) #16681

Closed ptrcnull closed 3 months ago

ptrcnull commented 1 year ago

continuing from https://github.com/SerenityOS/ladybird/issues/153#issue-1509918847:

it looks like the stack size returned by StackInfo#size_free() is invalid, causing constant [InternalError] Call stack size limit exceeded on even the smallest scripts. i tried to debug this, but sadly didn't get very far; also haven't tested if the same issue happens on gentoo or void with musl

To confirm, this is when running ladybird in a musl-libc based desktop environment?

@ADKaster Correct, I've tried running it on multiple machines and all of them had the same issue.

I tried running the tests myself on 379e4a2432e3db557c4ffb47a57a08df40f5662f (HEAD of master at the moment of posting) and they fail with the following:

serenity/Userland/Utilities/ntpquery.cpp:214:13: error: comparison of integers of different signs: 'unsigned long' and 'long' [-Werror,-Wsign-compare]
    VERIFY(!CMSG_NXTHDR(&msg, cmsg));
    ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/sys/socket.h:358:44: note: expanded from macro 'CMSG_NXTHDR'
        __CMSG_LEN(cmsg) + sizeof(struct cmsghdr) >= __MHDR_END(mhdr) - (unsigned char *)(cmsg) \
                                                  ^
/usr/include/assert.h:8:28: note: expanded from macro 'assert'
#define assert(x) ((void)((x) || (__assert_fail(#x, __FILE__, __LINE__, __func__),0)))
                           ^
1 error generated.
nekopsykose commented 1 year ago

that error is due to some internal stuff in the (musl) headers- it'll always fail with -Wsign-compare for the cmsg socket stuff. you'll have to get rid of werror like always to run those, but the actual warning doesn't apply there

nekopsykose commented 1 year ago

(for reference https://www.openwall.com/lists/musl/2016/10/11/1 . but you already knew to nuke Werror :) )

nekopsykose commented 1 year ago

i remember trying to debug this some time ago (months), and now, and it's not merely a stack size issue- raising the stacksize via the PT_GNU_STACK elf header (which musl understands) or using the pthread_attr_setstack api does not resolve it. the issue is in the actual calculation done in stackinfo (and if you simply comment it all out and make it say there is always space, the javascript pages work (as a quick test only, of course).). i couldn't figure out /why/ it fails at calculating the remaining stack size on musl libc specifically, and sadly lost any notes i had at the time.

linusg commented 3 months ago

Likely relevant: https://github.com/rust-lang/rust/blob/2259028a70d6e1a44ad2cfd81955b577a43e8ef6/library/std/src/sys/pal/unix/stack_overflow.rs#L344-L349

linusg commented 3 months ago

The implementation for musl pthread_getattr_np is here: https://github.com/kraj/musl/blob/2c124e13bd7941fe0b885eecdc5de6f09aacf06a/src/thread/pthread_getattr_np.c#L15-L21 (highlighted part relevant for the main thread)

glibc is vastly more complex and involves reading /proc/self/maps: https://github.com/bminor/glibc/blob/d49cd6a1913da9744b9a0ffbefb3f7958322382e/nptl/pthread_getattr_np.c#L76-L165

ADKaster commented 3 months ago

That's quite unfortunate that we can't calculate "how much stack is left" on musl the same way we do pretty much every other C library (except Wasm32 on emscripten).

Looking at your rust link it looks like the answer is "🤷‍♀️ lmao hope we don't overflow"? Or is there something other language runtimes that we can copy draw inspiration from?

linusg commented 3 months ago

I also looked at Python which is known to be very portable and handles stack overflow - they elaborately hardcode a recursion limit :facepalm:

I'm unsure if Rust deals with this safely on musl in some other way.

(reason why I'm suddenly here: https://donotsta.re/notice/AhnJDfkoKr4E3Fg4oK)

ADKaster commented 3 months ago

Wait I've got it. On musl, we should change LibMain to just call serenity_main in a secondary thread and then block the main thread waiting for it to exit! Genius.

ptrcnull commented 3 months ago

not sure if applicable here, but WebKit does some mildly cursed stuff for calculating the stack size on the main thread: https://github.com/WebKit/WebKit/blob/b64c3ed8059502609a5dba0c79c6d84773362c4a/Source/WTF/wtf/StackBounds.cpp#L139

ADKaster commented 3 months ago

Well... pulling from getrlimit(RLIMIT_STACK) and comparing that to some magic "origin" value doesn't seem like the worst idea. The commit message responsible for that part of wtf mentions that they do something similar on Darwin.

https://github.com/WebKit/WebKit/commit/062a7b1f299ece128242b9ce9e31024f9b104bdc

ADKaster commented 3 months ago

A patch to add a little extra spice to our ifdef soup in StackInfo.cpp along the lines of

#endif // big ifdef chain

#if defined(AK_OS_LINUX) && !defined(AK_OS_ANDROID) && !defined(AK_LIBC_GLIBC)
   // Note: musl libc always gives the initial size of the main thread's stack
   // < massage m_base and m_size  based on RLIMIT_STACK if this thread == main thread>
#endif

    m_top = m_base + m_size;
}

would work for me, if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).

I believe we have at least one test that relies on hitting the stack limits to work. The self-proxy one at least, that one broke when Andreas made ThrowCompletionOr small enough to get things optimized by gcc-13 :thousandyakstare:

linusg commented 3 months ago

I added https://github.com/SerenityOS/serenity/blob/master/Userland/Libraries/LibJS/Tests/runtime-error-call-stack-size.js in #3995 :)

ptrcnull commented 3 months ago

if some musl users can confirm it works (and passes test-js, our LibWeb tests, etc).

might take a while, there's still some random build failures ( e.g. #21709 was closed with no successor :pensive: )

edit: ...and segfaults in RequestServer; brb, rebuilding with proper debug symbols

ADKaster commented 3 months ago

re: request server, try this: 9d3edc3 (#24273)

ptrcnull commented 3 months ago

funnily enough, i stashed the stack size fix and forgot to reapply it before building, but it.. looks like it works without it(?) i'm still getting some funny errors though, like:

212496.294 WebContent(21449): Unhandled JavaScript exception: [TypeError] undefined is not a function
212496.294 WebContent(21449):     at <unknown>
    at <unknown>
    at parseQuery
    at parseQuery
    at <unknown>
    at <unknown>
ptrcnull commented 3 months ago

nevermind, it might simply be better than it was before, but searching anything in duckduckgo finally makes it trip with Call stack size limit exceeded on just 60-ish calls, no matter the stack size

ptrcnull commented 3 months ago

i tried doing something similar to what WebKit does, but it seems to now go the opposite way and assume the stack is bigger than it actually is:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
352        auto data = *reinterpret_cast<FlatPtr*>(stack_address);
(gdb) bt
#0  JS::Heap::gather_conservative_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:352
#1  0x00007f82356ea17d in JS::Heap::gather_roots (this=0x7f8212665868, roots=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:292
#2  0x00007f82356eaad6 in JS::Heap::collect_garbage
    (this=0x7f8212665868, collection_type=collection_type@entry=JS::Heap::CollectionType::CollectGarbage, print_report=print_report@entry=false)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibJS/Heap/Heap.cpp:282
#3  0x00007f8236c71848 in Web::Page::load_html (this=this@entry=0x7f8213a1e040, html=...)
    at /home/patrycja/Downloads/serenity/Userland/Libraries/LibWeb/Page/Page.cpp:69
#4  0x000055c848516d67 in WebContent::ConnectionFromClient::load_html
    (this=<optimized out>, page_id=<optimized out>, html=...)
    at /home/patrycja/Downloads/serenity/Userland/Services/WebContent/ConnectionFromClient.cpp:156