csiro-hydroinformatics / wila

A C++ framework to couple optimisation tools and simulation models
Other
2 stars 4 forks source link

Unit tests can get stuck on Linux and not complete. #6

Open jmp75 opened 6 years ago

jmp75 commented 6 years ago

Likely related to threadpool behavior. This was fixed but there may have been a regression. This issue is mostly to reference from a private project.

Credits to David Kent for diagnosing the following:

Basically we ultimately get a resource unavailable error when creating new threads. The linked commit does now at least fail with an exception rather than hanging.

The underlying issue is that a new threadpool is created for every complex evolution. This means a LOT f threads get created and destroyed. Any thoughts on how this could be rationalised? Could the threadpool be reused rather than started from scratch each evolution?

jmp75 commented 6 years ago

If I use resize rather than creating the threadpool with a size, the following can return false on resizing the pool.

        while(m_worker_count < m_target_worker_count)
        {
          try
          {
            worker_thread<pool_type>::create_and_attach(lockedThis->shared_from_this());
            m_worker_count++;
            m_active_worker_count++;    
          }
          catch(thread_resource_error)
          {
            return false;
          }

Note that this does not always occur, at least when running in debug mode from VS code. I hit the 'return false' once and the call stack shows what is I assume 22 threads paused, not all of them having a call stack shown. It seems the disposal of threads may take some time and if creation of new threads is faster than disposal (esp. in case of fast running unit tests threaded tasks), may hit a limit. Still this is not a huge number of threads so far as I understand limits, so still find it curious.

So the intended fix that was re-commented out actually has problems of its own. Will have to try to tackle this another way. Probably by limiting the creation of the threadpool to a minimum. The CrossThreadException class is handy and has a clear purpose, but given this issue on Linux probably need to be refactored to reduce the rate of creation of new threads.