Closed keithmoss closed 10 years ago
Since we played with multiprocessing for #22 the focus for this would now be to give it a whole directory and have it process the assets therein.
You'd probably want to make it aware of its progress through the directory to make recovering from errors easier (e.g. store progress, start at last unsuccessful asset upload).
Supported by changes to the config loader and - partially - by multithreaded files uploads for assets.
Being able to point Hodor at a folder of assets and tell it to go process them would be neat.
We could be lazy and just spawn whole new subprocesses, but instead let's learn a new Python thing and play with multithreading!
We'll spawn multiple threads (using the multiprocessing module to avoid GIL?) and handle uploads and processing therein.
Q: Can we pre-emptively return something once upload is done, and before polling begins, to signal that we're clear to start uploading the next asset?
If not, we could be lazy and create the asset placeholder outside of the thread and poll there (as well as inside the thread) to determine when the next thread can be spun up.
Or, we could make polling a separate action and spawn a second thread after the first one to handle that as well as any other additional tasks (e.g. adding the raster to a mosaic). This might be a cleaner approach, actually - separation of concerns and all that.
We still need to answer the next question about how to handle later threads spawning before the first one, though....
Q: If not, can Python's multithreading handle spawning a few processing to begin with, waiting via t.join() or similar (thus sparing thrashing the CPU), and then spawning some more once one finishes?
t.join() blocks the calling thread though, so if threads begun after the first thread end up finishing before it we wouldn't know about it until the first thread finished and the main thread moved on to check them.
See Python's Hardest Problem and Python multithreading for dummies (StackOverflow).