CapPow / HerbASAP

An effort to automate image post processing for natural history collections
GNU General Public License v3.0
8 stars 3 forks source link

Threading for tasks in postProcessing.py #5

Closed CapPow closed 5 years ago

CapPow commented 5 years ago

Conditions & Reproduction

Expected Outcome

j-h-m commented 5 years ago

Adding a link here for myself: https://github.com/pyqt/examples

j-h-m commented 5 years ago

Which functions need to wait on others? I believe I can use a boss/worker threading model to handle this. I just need to know the order of all of the functions if you want everything to run after one event fires (button).

CapPow commented 5 years ago

In postProcessing.py's processImage() flow:

white_balance_image() needs to be called on the results of eq_worker and after the non-threaded colorchipDetect.predict_color_chip_location().

save_output_images() needs to be called after, well everything we have so far. While testing using the angled barcodes in exampleImages/UCHT_test_set/ it would sometimes be called before a barcode was retrieved from bc_worker.

CapPow commented 5 years ago

The most recent commit: https://github.com/CapPow/autoPostProcessing/commit/2e39c87fa2859241e2506e5acd5c57b63cdc0364 added folder monitoring which handles triggering processImage() and saving the outputs. The save_output_images() was temporarily added into the alert_bc_finished() function. The triggered image saving functions are called after the finished signal is emitted from bcRead. It is not always true that bcRead takes longer than eqRead or ccRead. This solution is only temporary to testing the workflow while asynchronous threading is solved. Commenting it here so we document the need appropriately place save_output_images() in the future.

j-h-m commented 5 years ago

An easy/temporary way to fix this, if I'm understanding the problem correctly, might be to create 3 global boolean variables that are toggled when each thread is started and finished respectively. Use those three variables in one function that each worker thread's finished signal would connect to. When all that are expected to be true are true, then continue to save_output_images(). I've tried to implement this but I'm having trouble consolidating all of the other global data that each thread/function needs.

I'm still researching how to properly do a boss/worker thread model with PyQt, I'm trying to find a way to use QEvents, rather than an infinite loop, to have the boss wait for requests and send responses, but it is taking some time.

jbest commented 5 years ago

I'm not familiar with all the challenges of multi-threading, but I did happen across the Python queue class which might be of use: https://docs.python.org/3.7/library/queue.html

j-h-m commented 5 years ago

I'm about to dump out a bunch of info, basically my thought process, so bear with me. I would appreciate any feedback or suggestions anyone has if you make it through this, lol.

@jbest I think the biggest issue that arises when mutli-threading is when multiple threads need to access the same data at the same time. Which is where mutexes, semaphores, and race conditions come from. In our case, it seems that we don't have this issue because either the thread's operation and result are unique, or a thread relies on a predecessor's results before it can run.

I am by no means a multi-threading expert, nor am I a Python export, but from what I have read multithreading is difficult and problematic in Python because of the GIL. I'm hoping that using Qt's thread implementation will get around this. I was actually extremely excited when I saw they had their own threading implementation and we weren't using the 'threading' module. I used it once with a Tello drone project that need to send commands / receive video / and other stuff with separate threads using the TelloPy module, but it wasn't very stable or predictable.

My idea was to create a boss-worker model. The boss would start running when the application itself starts. I was hoping to subclass QEvent (or something like that) to have an event-driven boss thread that could do requests/responses for the UI thread. (That way we don't spawn a boss that is in an infinite "listen" loop that is wasting resources.) The boss would take a request and spawn a worker thread and manage the worker's signals, and then report back to the UI thread when finished. This would allow us to move the control of the threads to a separate thread. That way we're not creating another block if we're waiting for a separate thread on the main UI thread.

We may be able to get around creating a separate boss thread though. If we only ever wire signals up to another function. For example, instead of waiting to start a thread, or run a function, until all threads finish, we should write functions that are wired up to a thread's finished signal and then run whatever function or thread needs to run from the function.

Achieving either of the above methods will require significantly less complexity if we create classes (or models) of the data needed for each thread/worker. This will allow us to create individual instances to send to each worker which will completely remove a need for mutexes, and my personal least favorite of all programming features, global variables. Which I've found to be notoriously difficult to debug, as simple as they appear to be!

CapPow commented 5 years ago

So, we created a problem by spawning threaded workers? The solution must be MORE Threaded workers! Based on comments here I created an additional thread which monitors boolean class attributes to see if they're done yet. I assume this is not the most efficient solution but it should only be called at a time where we're waiting on stragglers to wrap up.

j-h-m commented 5 years ago

I'm going to try to show what I mean here in a branch. I think it will be helpful if it works like I think it will.

https://github.com/CapPow/autoPostProcessing/tree/threading-dev