emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.74k stars 3.3k forks source link

emscripten_dlopen fails on Safari #21571

Open Honya2000 opened 7 months ago

Honya2000 commented 7 months ago

Hello,

I recently implemented asynchronous dynamic modules support for my WASM project. Loading is quite simple:

  1. I download the module using fetch API.
  2. store downloaded binary to virtual emscripten FS file.
  3. asynchronously load and instantiate wasm .so file using function emscripten_dlopen.

Since loading is asynchronous - there are usually several modules in the fly, depending on assets. The system works fine in chrome, but doesn't work in Safari (both MacOS and iOS). It reports unresolved exported symbols in side modules. But problem is the symbols aren't related to those modules at all. They are located in other side modules.

For example it throws error for DdsImporter.so module: bad export type for '_ZTVN4Tmrw4Data16GltfImporterE": undefined. This symbol is just virtual functions table of Gltfimporter class which is not part of DdsImporter at all, and DdsImporter is not using it. GltfImporter class is instantiated in another side module.

Safari always mixes the modules randomly during resolving symbols. Always throwing different resolving issues for different modules. There is no determinism.

There is example url to check the issue: http://honya.myftp.org:88/wrooms/index.html?native_debug=true

sbc100 commented 7 months ago

Can you try load the module just one at a time to see if that makes the problem go away? i.e. can you wait for the result of one emscripten_dlopen before trying another?

Also, can you share that full set of link flags that you are using?

Honya2000 commented 7 months ago

I'm currently using RTLD_NOW flag ony. But i think i tried to use all the possible combination of flags.

Tomorrow will try to load modules one by one. But even if this would fix the Safari issue, i doubt i will accept this as a workaround solution...

Btw seems like the same issue happens in windows Firefox as well.

sbc100 commented 7 months ago

I'm currently using RTLD_NOW flag ony. But i think i tried to use all the possible combination of flags.

Tomorrow will try to load modules one by one. But even if this would fix the Safari issue, i doubt i will accept this as a workaround solution...

Sorry, I didn't mean to suggest that as the final solution, just as an aid to debugging the issue.

i.e. is the root cause of the issue that async nature of the code loading and the interleaving of the async work?

Btw seems like the same issue happens in windows Firefox as well.

Honya2000 commented 7 months ago

How to properly wait in fetch API onsuccess callback?

i tried blocked wait approach: while (g_nModulesInFly > 0) { emscripten_sleep(1); } g_nModulesInFly++;

, but looks like in this case it never exits from the loop.

dlopen code looks like this:

    while (g_nModulesInFly > 0)
    {
        emscripten_sleep(1);
    }
    g_nModulesInFly++;

    emscripten_dlopen(context->filePath.c_str(), RTLD_NOW, context,
        [](void* ctx, void* handle)
        {
            DLContext* context = static_cast<DLContext*>(ctx);
            //const char* err = dlerror();
            context->callback(handle, PreparePluginResult::Ok);
            delete context;

            g_nModulesInFly--;
        },
        [](void* ctx)
        {
            DLContext* context = static_cast<DLContext*>(ctx);
            const char* err = dlerror();
            if (err)
            {
                Error{} << "dlopen error:" << err;
            }
            context->callback(nullptr, PreparePluginResult::DlopenError);
            delete context;

            g_nModulesInFly--;
        }
      );
Honya2000 commented 7 months ago

Well, i commented emscripten_sleep in the loop, and now it works...

But unfortunately this didn't fix the issue on Safari and Firefox. The same random undefined symbols during loading of .so modules.

Honya2000 commented 7 months ago

Probably i have to execute dlopen from main thread? Currently it executed from fetch callback which is not the main thread?

Honya2000 commented 7 months ago

Ok, i moved emscripten_dlopen to main thread and now the issue solved in Safari (both desktop and mobile) and Firefox! So dlopen cannot be execited from fetch callback.

sbc100 commented 7 months ago

Ah, I should have asked initially if you were using threads.

Can you share the full set of emcc link flags you are using?

Can you describe more the sequence you follow in the broken case? e.g How are you using the fetch API and and thread API and dlopen API to trigger the issue?

Honya2000 commented 7 months ago

No, threads are disabled in this specific emscripten build. Threads are used by the browser when it calls promises call-backs.

sbc100 commented 7 months ago

Threads are used by the browser when it calls promises call-backs.

Can you explain what you mean by this? My understanding is that browsers are single threaded and that all callback happen on the same thread.

Honya2000 commented 7 months ago

Above i mentioned the loop, which i executed in fetch onsuccess callback:

        emscripten_fetch_attr_t attr;
        emscripten_fetch_attr_init(&attr);
        strcpy(attr.requestMethod, "GET");
        attr.userData = context;
        attr.attributes = EMSCRIPTEN_FETCH_LOAD_TO_MEMORY;
        attr.onsuccess = [](emscripten_fetch_t* fetch)
        {
            CompDLContext* context = static_cast<CompDLContext*>(fetch->userData);

            //Debug{} << "Downloaded plugin:" << context->pluginPath.c_str() << ", size:" << fetch->numBytes;
            //Debug{} << "Saving plugin to virtual FS:" << context->filePath.c_str();
            FILE* f = fopen(context->filePath.c_str(), "wb");
            if (!f)
            {
                Error{} << "Failed to store plugin file to virtual file-system:" << context->filePath.c_str();

                context->callback(nullptr, CompPreparePluginResult::SaveError, fetch->numBytes, nullptr);
                delete context;
                return;
            }
            fwrite(fetch->data, fetch->numBytes, 1, f);
            fclose(f);

            emscripten_fetch_close(fetch);

            while (g_nModulesInFly > 0)
            {
               emscripten_sleep(1);
            }
            g_nModulesInFly++;

            emscripten_dlopen(context->filePath.c_str(), RTLD_NOW, context,
                [](void* ctx, void* handle)
                {
                    CompDLContext* context = static_cast<CompDLContext*>(ctx);
                    //const char* err = dlerror();
                    context->callback(handle, CompPreparePluginResult::Ok, context->size, nullptr);
                    delete context;

                    g_nModulesInFly--;
                },
                [](void* ctx)
                {
                    CompDLContext* context = static_cast<CompDLContext*>(ctx);
                    const char* err = dlerror();
                    context->callback(nullptr, CompPreparePluginResult::DlopenError, context->size, err);
                    delete context;

                    g_nModulesInFly--;
                });
        };

I mentioned that the loop never exits. But the point is it didn't block the app at all. It loaded and running fine. That means that the main thread wasn't blocked. And callback executed in separate thread.

Honya2000 commented 7 months ago

Anyway the issue was fully solved as soon as i moved emscripten_dlopen outside of the callback, and processed it in main app thread.