LArbys / LArCV

Liquid Argon Computer Vision
11 stars 9 forks source link

ThreadDatumFiller threading #53

Closed vgenty closed 8 years ago

vgenty commented 8 years ago

I ran into an annoying snag running multiple inference in parallel with pycaffe. I like the threading feature implemented in ThreadDatumFiller but I think 1 millisecond delay usleep(1000) in batch_process located here

https://github.com/LArbys/LArCV/blob/develop/app/APICaffe/ThreadDatumFiller.cxx#L246

could be improved. If I launch multiple ThreadDatumFillers at once, each with their own threads (for example multiple inferences in parallel), the thread may not allocate and _batch_process_ may not begin execution in time for heproot.cpp.

This means that the ProcessDriver instance may not be initialized for the first time, and _processing may be false when heproot.cpp queries ThreadDatumFiller for the data dimensions, as they may not be available yet. Yikes! Maybe replacing usleep(1000) with

 while (!_processing)                                                                                                                                              
        usleep(1000);                                                                                                                               

is better. This would ensure that the first thread creation time doesn't race against the first memcpy here

https://github.com/LArbys/caffe/blob/master/src/caffe/util/heproot.cpp#L38

vgenty commented 8 years ago

Hmm, I continue to having trouble with the race between querying dim()

https://github.com/LArbys/LArCV/blob/develop/app/APICaffe/ThreadDatumFiller.cxx#L150

in heproot despite the change noted above. Sometimes _thread_is_running=True (throwing larbys) and _processing=True...

Of course all of this is solved with UseThread: false in ProcessDriver configuration.

vgenty commented 8 years ago

In heproot.cpp we ask the filler if thread_running(), this may be false if the thread itself hasn't set _thread_running=true inside the threa (overhead time > 1ms on a busy CPU). The subsequent call to dim() after the while loop will error!

vgenty commented 8 years ago

Waiting for the thread to come alive

 while (!_thread_running)                                                                                                                                              
        usleep(1000); 

after the std::move seems to prevent a race to the thread_running() query in heproot. Though this may defeat the purpose of having the thread... let me know what you think

drinkingkazu commented 8 years ago

Sorry for taking long to respond + thanks A LOT for your nice debugging work.

A solution would be to keep track of full states of the thread. Currently we track only 2 out of 3 states via boolean. I will implement 3 following states via enum:

) thread is not running ) thread is instantiating *) thread is running

drinkingkazu commented 8 years ago

I made a feature branch feature/thread_bugfix where I pushed a fix https://github.com/LArbys/LArCV/commit/a9c371c01fc9b246c89e4917687a5aef4011db24

@vgenty can you test?

drinkingkazu commented 8 years ago

@vgenty sorry I forgot to merge into apicaffe_cleanup branch which you are likely using. Let me do that.

drinkingkazu commented 8 years ago

OK now merged into apicaffe_cleanup https://github.com/LArbys/LArCV/commit/0ea81d4fefa794952e7bf845aa9c7651f57855b8

drinkingkazu commented 8 years ago

@vgenty I believe this is confirmed to be fixed. If you agree would you close it?

vgenty commented 8 years ago

Yes, sublime tabbing looks removed too! 💃