Closed vgenty closed 8 years ago
Hmm, I continue to having trouble with the race between querying dim()
https://github.com/LArbys/LArCV/blob/develop/app/APICaffe/ThreadDatumFiller.cxx#L150
in heproot
despite the change noted above. Sometimes _thread_is_running=True
(throwing larbys
) and _processing=True
...
Of course all of this is solved with UseThread: false
in ProcessDriver
configuration.
In heproot.cpp
we ask the filler if thread_running()
, this may be false
if the thread itself hasn't set _thread_running=true
inside the threa (overhead time > 1ms on a busy CPU). The subsequent call to dim()
after the while
loop will error!
Waiting for the thread to come alive
while (!_thread_running)
usleep(1000);
after the std::move
seems to prevent a race to the thread_running()
query in heproot
. Though this may defeat the purpose of having the thread... let me know what you think
Sorry for taking long to respond + thanks A LOT for your nice debugging work.
A solution would be to keep track of full states of the thread. Currently we track only 2 out of 3 states via boolean. I will implement 3 following states via enum:
) thread is not running ) thread is instantiating *) thread is running
I made a feature branch feature/thread_bugfix where I pushed a fix https://github.com/LArbys/LArCV/commit/a9c371c01fc9b246c89e4917687a5aef4011db24
@vgenty can you test?
@vgenty sorry I forgot to merge into apicaffe_cleanup branch which you are likely using. Let me do that.
OK now merged into apicaffe_cleanup https://github.com/LArbys/LArCV/commit/0ea81d4fefa794952e7bf845aa9c7651f57855b8
@vgenty I believe this is confirmed to be fixed. If you agree would you close it?
Yes, sublime tabbing looks removed too! 💃
I ran into an annoying snag running multiple inference in parallel with
pycaffe
. I like the threading feature implemented inThreadDatumFiller
but I think 1 millisecond delayusleep(1000)
inbatch_process
located herehttps://github.com/LArbys/LArCV/blob/develop/app/APICaffe/ThreadDatumFiller.cxx#L246
could be improved. If I launch multiple
ThreadDatumFiller
s at once, each with their own threads (for example multiple inferences in parallel), the thread may not allocate and_batch_process_
may not begin execution in time forheproot.cpp
.This means that the
ProcessDriver
instance may not be initialized for the first time, and_processing
may befalse
whenheproot.cpp
queriesThreadDatumFiller
for the data dimensions, as they may not be available yet. Yikes! Maybe replacingusleep(1000)
withis better. This would ensure that the first thread creation time doesn't race against the first
memcpy
herehttps://github.com/LArbys/caffe/blob/master/src/caffe/util/heproot.cpp#L38