AcademySoftwareFoundation / openexr

The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.
http://www.openexr.com/
BSD 3-Clause "New" or "Revised" License
1.63k stars 616 forks source link

Impossible to provide custom thread pool provider #1285

Open e4lam opened 2 years ago

e4lam commented 2 years ago

I tried to add a custom thread pool provider for OpenEXR (using the internal DefaultThreadPoolProvider as reference) via IlmThread::ThreadPool::globalThreadPool().setThreadProvider(). It seems that the provider should call task->group()->_data->removeTask(); after executing the task. However, the definition of IlmThread::TaskGroup::Data is not provided, making it impossible to do this.

Is one expected to be able to customize OpenEXR like this by the application? If so, then I think some changes are needed in order to allow this.

For reference, here's the custom thread pool provider that I tried to use but it (expectedly) fails to compile on MSVC with: error C2027: use of undefined type 'IlmThread_3_1::TaskGroup::Data'

Complete source:

class TBBThreadProvider : public IlmThread::ThreadPoolProvider
{
public:
    TBBThreadProvider() = default;
    ~TBBThreadProvider() noexcept override { finish(); }

    int numThreads() const override { return myArena->max_concurrency(); }

    void setNumThreads(int count) override
    {
        finish(); // TODO: Is this strictly needed?
        myArena = std::make_unique<tbb::task_arena>(count);
    }

    void addTask(IlmThread::Task *task) override
    {
        myArena->execute([&]
        {
            myGroup.run([&]
            {
                task->execute();
                task->group()->_data->removeTask();
                delete task;
            });
        });
    }

    void finish () override
    {
        myArena->execute([&] { myGroup.wait(); });
    }

private:
    std::unique_ptr<tbb::task_arena> myArena = UTmakeUnique<tbb::task_arena>();
    tbb::task_group myGroup;
};
kdt3rd commented 1 year ago

Ah, yes. Poor documentation and inconsistent internal usage. You should just call task->group()->finishOneTask() such that the data stays an internal data structure, which if you look, calls data->removeTask for you. Do let me know if that works, and if you are willing, would love to have a contribution that enables exr to optionally use tbb (which is kind of why the abstraction around the provider was made in the first place) - would be great to reduce the number of worker pools around in applications...

lgritz commented 1 year ago

Yes, please. The new OIIO release family also adds the ability to use TBB instead of its native thread pool, and I'd love to have OIIO + OpenEXR + app possible to all share one pool.

e4lam commented 1 year ago

Hi folks, unfortunately, I've mostly run out of time to work on this for the moment. This is what I had so far:

class TBBThreadProvider : public IlmThread::ThreadPoolProvider
{
public:
    TBBThreadProvider() = default;
    ~TBBThreadProvider() noexcept override { finish(); }

    int numThreads() const override { return myArena->maxConcurrency(); }

    void setNumThreads(int count) override
    {
        // TBB will happily continue running enqueued tasks and the internal
        // arena will destroy itself when done but callers might expect that
        // the tasks are done before this call returns.
        finish();
        myArena = std::make_unique<tbb::task_arena>(count);
    }

    void addTask(IlmThread::Task *task) override
    {
        // For some reason, there is a synchronization issue with Task and TBB
        // so we must enqueue() into the arena or else we sporadically hang
        // when the caller tries to wait for the tasks to finish.
        myArena->enqueue([self = this, task]
        {
            self->myGroup.run([task]
            {
                std::unique_ptr<IlmThread::Task> task_holder(task);
                task->execute();
                task->group()->finishOneTask();
            });
        });
    }

    void finish() override
    {
        myArena->execute([&] { myGroup.wait(); });
    }

private:
    std::unique_ptr<tbb::task_arena> myArena = std::make_unique<tbb::task_arena>();
    tbb::task_group myGroup;
};

When I tried enabling the above code, I still ended up getting deadlocks even when using tbb::task_arena::enqueue() (see comment in addTask() above). It's a bit hard to track down as it only happens sporadically during our regression test runs on our CI machines. As far as I can tell, it seems that the task completes but the main thread still ends up waiting on the isEmpty semaphore inside TaskGroup::Data's destructor. Here's the stack on Windows inside Visual Studio when we hang:

    ntdll.dll!NtWaitForSingleObject()   Unknown
    KernelBase.dll!WaitForSingleObjectEx() + 0x8e bytes Unknown
>   IlmThread_sidefx.dll!IlmThread_3_1::Semaphore::wait() Line 81   C++
    IlmThread_sidefx.dll!IlmThread_3_1::TaskGroup::Data::~Data() Line 443   C++
    IlmThread_sidefx.dll!IlmThread_3_1::TaskGroup::~TaskGroup() Line 623    C++
    OpenEXR_sidefx.dll!Imf_3_1_sidefx::ScanLineInputFile::readPixels(int scanLine1, int scanLine2) Line 1719    C++
    OpenEXR_sidefx.dll!Imf_3_1_sidefx::InputFile::readPixels(int scanLine1, int scanLine2) Line 916 C++
    OpenImageIO_sidefx.dll!00007ff84f3a9d18()   Unknown
    OpenImageIO_sidefx.dll!HOIIO_v2_3::ImageInput::read_scanlines() + 0x447 bytes   Unknown