SuperElastix / elastix

Official elastix repository
http://elastix.dev
Apache License 2.0
481 stars 116 forks source link

elastix 4.9 static libary version not thread safe #174

Open jiangliMED opened 5 years ago

jiangliMED commented 5 years ago

I was using elastix 4.9 static library version. I created two threads to run registrations simultaneously, one with 3d registration and the other with 2d registration. And the result was either elastix crashes the program or it produced wrong registered results. Howerver, if i run the two threads sequentially, the output result was ok. I guess the issue might be resulted from some static varibles used internally by elastix. Can someone investigate this issue?

mstaring commented 5 years ago

This is indeed a known issue. I was under the impression it was (partially) fixed in the develop branch? @N-Dekker do you know?

ntatsisk commented 4 years ago

@mstaring @N-Dekker Is the issue resolved for elastix 4.9? Because I am using elastix 5.0 as static library and it seems not to be thread-safe. I have tried both the master branch (Release/Debug) and the develop branch (Release) but the issue remains. I am on Windows 10 with VS2017 if it is relevant. Thanks in advance.

ntatsisk commented 3 years ago

It seems that there are two (at least) groups of things that are making elastix not thread safe. After certain modifications in the current develop branch, I was able to run, finally, registrations in parallel at user level. Probably, the two groups are known but I add them here for the discussion:

The static ComponentDatabasePointer & ComponentLoaderPointer (link). I assume they are static so that the components are not loaded/installed if the user provides more than one parameterMaps at this loop. Is there maybe another reason that they are static? I turned them to normal members of the ElastixMain class to make it work. Of course, a mutex at init could also work if they need to be static.

The second group is more-or-less anything related with the logger and I guess a proper solution would be quite involved. What I did was to simply deactivate any function that was performing write operation. Of course, this means that the logger was not working (not a problem for the application I am working on though). Could that be a possible temporary option? When the user has set LogToConsoleOff() and LogToFileOff() then the logger is deactivated and then running in parallel can work.

I am looking forward to hearing your thoughts. By the way, please let me know if there is any other datarace/thread-safety issue that I haven't spotted. Thanks!

N-Dekker commented 3 years ago

Interesting suggestions @ntatsisk So a possibility could be to have multiple ElastixMain objects running in parallel, each having their own (non-static) ComponentDatabase. Do I understand correctly that each ElastixMain object would then internally have a ComponentDatabase with exactly the same content?

ntatsisk commented 3 years ago

Thanks for the reply, @N-Dekker! Yes, in the naive option that I followed each ElastixMain object has its own ComponentDatabase with exactly the same content. However, even in the application that I am working on which involves 600 rigid registrations using 24 threads required to run in under 10 sec, I didn't notice any extreme memory consumption and the speed goal was also achieved.

Now, maybe a better solution would be to keep both ComponentDatabase and ComponentLoader static but add a mutex during their initialization (loading of the components) because that is the point that I noticed there are data races. After that loading I think both objects are only read so no issue there. Maybe a good starting point for the mutex would be this one but I haven't tried that.

N-Dekker commented 3 years ago

@ntatsisk If the naive option of having a ComponentDatabase for each ElastixMain object does not really affect the performance, it may be worth considering. But then I'm still wondering if it would break any code, to no longer being able to call ElastixMain ::GetComponentDatabase() as a static function. What d you think?

If both ComponentDatabase and ComponentLoader should still remain static anyway, maybe we could make use of the C++11 feature "magic statics". From C++11, the initialization of a local static variable is guaranteed thread-safe (no need to use a mutex) 😃 http://www.nuonsoft.com/blog/2017/08/10/implementing-a-thread-safe-singleton-with-c11-using-magic-statics

ntatsisk commented 3 years ago

@N-Dekker, from a quick search in the elastix codebase I see that ElastixMain::GetComponentDatabase() (as well as its Set counterpart) are called limited times and always through an ElastixMain object. So, I don't see why they were static in the first place since a non-static function can return a static member variable. Of course, I have limited experience with the elastix source code and maybe I am missing something ;)

However I think we should still consider keeping the ComponentDatabase and ComponentLoader static combined with a singleton (not just a mutex that I mentioned previously). Btw, I didn't know about the magic statics, great feature! How could this be implemented? A vague idea is using only one singleton for the ComponentLoader which should initialize the ComponentDatabase and do the ComponentLoader::LoadComponents functionality in its constructor. A bonus is that it would also remove the need for LoadComponents and UnloadComponents functions which is against the RAII principle. I could give that a try. Any different ideas?

N-Dekker commented 3 years ago

Thanks @ntatsisk I was just considering to call ElastixMain::GetComponentDatabase() myself, in a context where I may not have an ElastixMain object. But I have to have another look.

A bonus is that it would also remove the need for LoadComponents and UnloadComponents functions which is against the RAII principle.

While I'm a big fan of the RAII principle, I don't care so much about ComponentLoader::UnloadComponents() being called properly, as it has been empty for more than a decade now 😸 From commit c09357417b0feb33d86812035c255e26473dfd38 by Stefan (@stefanklein), 12 Feb 2008: https://github.com/SuperElastix/elastix/blob/c09357417b0feb33d86812035c255e26473dfd38/src/Core/Install/elxComponentLoader.cxx#L153-L162

Do you agree that it's OK not to worry about ComponentLoader::UnloadComponents()?

stefanklein commented 3 years ago

Wow, this feels like a time machine :)

From: Niels Dekker notifications@github.com Sent: Tuesday, December 22, 2020 12:58 PM To: SuperElastix/elastix elastix@noreply.github.com Cc: S. Klein s.klein@erasmusmc.nl; Mention mention@noreply.github.com Subject: Re: [SuperElastix/elastix] elastix 4.9 static libary version not thread safe (#174)

Thanks @ntatsiskhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fntatsisk&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815728248%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=IEmi%2BAba1%2Ftq8PBFtJxZ8VUOdb0TU8dPA2PrYsfMmFU%3D&reserved=0 I was just considering to call ElastixMain::GetComponentDatabase() myself, in a context where I may not have an ElastixMain object. But I have to have another look.

A bonus is that it would also remove the need for LoadComponents and UnloadComponents functions which is against the RAII principle.

While I'm a big fan of the RAII principle, I don't care so much about ComponentLoader::UnloadComponents() being called properly, as it has been empty for more than a decade now 😸 From commit c093574https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSuperElastix%2Felastix%2Fcommit%2Fc09357417b0feb33d86812035c255e26473dfd38&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815728248%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TdQ6g%2FYIMRuNLNW5EX8ITz%2F7Q2SCD2rNgLWDjWHPhq8%3D&reserved=0 by Stefan (@stefankleinhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fstefanklein&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815738242%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OIv5Kgl0vuYnvn8bGCbDYZb2sRL3rs%2FVLVi020QFXdk%3D&reserved=0), 12 Feb 2008: https://github.com/SuperElastix/elastix/blob/c09357417b0feb33d86812035c255e26473dfd38/src/Core/Install/elxComponentLoader.cxx#L153-L162https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSuperElastix%2Felastix%2Fblob%2Fc09357417b0feb33d86812035c255e26473dfd38%2Fsrc%2FCore%2FInstall%2FelxComponentLoader.cxx%23L153-L162&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815738242%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=uJqqfsrkBM9ZW2MCThkhEM393n7zafChxMM2ZOZ6ufM%3D&reserved=0

Do you agree that it's OK not to worry about ComponentLoader::UnloadComponents()?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSuperElastix%2Felastix%2Fissues%2F174%23issuecomment-749505067&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815748237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=sP7L89uApoaEwVNuifIW6X7AqzR%2FrzpmENDFZeLfpm0%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAF2LNIHPD2Z4PCY2HXMNJTSWCCUDANCNFSM4IMFJPEA&data=04%7C01%7Cs.klein%40erasmusmc.nl%7C5ebdb9c33db04e0b485008d8a670d30d%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C637442350815748237%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ofv%2BBHKQMBrsLHSOcoq6qbjCPlLfX7Hz2rX9HXp4oxo%3D&reserved=0.

ntatsisk commented 3 years ago

@stefanklein Great flashback from the elastix past indeed! ;)

@N-Dekker Sorry, I was not explicit. I was mainly worried about the ElastixMain::UnloadComponents:

https://github.com/SuperElastix/elastix/blob/3acd58e5efbe820305f4ba696bdaab4658a39861/Core/Kernel/elxElastixMain.cxx#L760-L776

But I think that this is a slightly different discussion since it could be fixed in either case (static or not). I think that discussion goes in favor of keeping ComponentLoader and ComponentDatabase static and working around that, right? That way you will also have the option to call ElastixMain::GetComponentDatabase() on its own without an object.

N-Dekker commented 3 years ago

@ntatsisk Thanks for your explanation regarding UnloadComponents(). In general I'm reluctant to introduce static data or singletons. I'm aware of the Singleton being a controversial design pattern, or "anti-pattern": https://en.wikipedia.org/wiki/Singleton_pattern#Anti-Pattern_considerations But now that s_CDB and s_ComponentLoader are static already, maybe we should first try if we can just fix the thread-safety issue, while keeping them static... albeit "magic static" 😸

N-Dekker commented 3 years ago

@ntatsisk I just did a first attempt (draft) to fix the issue, at https://github.com/SuperElastix/elastix/tree/ElastixMain-LoadComponents-thread-safe Is it similar to what you have in mind? Or did you already prepare a fix as well?

My first attempt did use C++11 "magic statics" to ensure that the components are loaded exactly once (in a thread safe manner). But then they are only unloaded at the end of the program (automatically). I hope that's OK!

Also there are still some test failures at the CI! https://github.com/SuperElastix/elastix/runs/1678230364 and https://dev.azure.com/kaspermarstal/Elastix/_build/results?buildId=1526&view=logs&j=34a65711-f108-577e-6707-abb292f048dd&t=8c37f5e7-baf8-54ce-925d-207c0d622869

@stefanklein @mstaring Is it important to be able to explicitly unload the components any time elastix is running? If so, for what particular use case?

N-Dekker commented 3 years ago

@ntatsisk The component database should now be thread-safe, with "develop" branch commit e5b36582b6d9f6a281dd3a52c2ac70d37b973772. I made a first attempt to ensure that the management of output streams (opening and closing of streams) is also thread-safe, at a new branch, Share-xoutManager. If you have the time, please have a look. The idea is that threads may share the "output manager", so that only the first thread opens the streams, while only the last thread closes them.

ntatsisk commented 3 years ago

Hi @N-Dekker, thank you for your work so far on this issue and now continuing with the logger! I have started looking at the implementation details of the current logger as well as the changes that you proposed in the new branch. However, I have a question (maybe basic): The resetGuard seems to belong the local scope of the GetSharedManager function. Doesn't that mean that the returned shared pointer will always be reset when the guard goes out of scope (aka the function always returns nullptr)? What is the idea behind the ResetGuard?

N-Dekker commented 3 years ago

I have started looking at the implementation details of the current logger as well as the changes that you proposed in the new branch. However, I have a question (maybe basic): The resetGuard seems to belong the local scope of the GetSharedManager function. Doesn't that mean that the returned shared pointer will always be reset when the guard goes out of scope (aka the function always returns nullptr)? What is the idea behind the ResetGuard?

Thanks for having a look at my possible "shared xout manager" solution!

First of all I should say that this may not be the final solution. Marius (@mstaring) suggested that each elastix-filter could have its own xout-object, instead of having a global xout-object that is shared between the filter instances.

Anyway, the intention of xoutManager::GetSharedManager is that all filter instances would share the same "manager object". But then, the manager object should be destroyed automatically, when all filters are finished. If GetSharedManager would just create a local static shared_ptr<xoutManager> object by new xoutManager(...) and return the shared pointer, it would never be destructed anymore! The reference count of the shared pointer would never get back to zero!

This is what I try to avoid by doing a reset() when the function goes out of scope. So that only the callers (the filters) keep the manager alive.

Anyway, even if this is a proper solution for the lifetime (creation and destruction) of xout, it does not yet solve concurrent access to the output streams. To be continued...

N-Dekker commented 3 years ago

@ntatsisk Would a solution that requires switching off output to TransformParameters.txt be acceptable (or useful) to you? By configuration option (WriteFinalTransformParameters "false")

mstaring commented 3 years ago

also, please check that the output directory has been set to different directories for each instance of elastix.

ntatsisk commented 3 years ago

@N-Dekker Indeed, that would be useful. We only use the transformation parameter values internally in the code and we don't output anything to file/console (LogToConsoleOff() & LogToFileOff()).

N-Dekker commented 3 years ago

@ntatsisk FYI I just finished my task to ensure that all parameters that are written to TransformParameter.txt files are also available in memory, in transform parameter maps. I guess that might be helpful to you as well 😃 (Now merged onto the develop branch: pull request #402, #395, #385)