As a user I want to see the results as soon as possible. Right now, each video is processed sequentially, with each one taking many hours to process. This can result in a delay of a day or more between doing an experiment and seeing the results.
Acceptance criteria
[ ] Allow encoding sub processes to run simultaneously, instead of in sequence.
[ ] Make sequential or parallel encoding an option.
[ ] Properly handle the outputs of each sub process, so that their outputs are not garbled in the log.
[ ] Ensure that any encoding error will cause the entire job to error: thereby preventing silent errors.
Sprint Ready Checklist
[x] 1. Acceptance criteria defined
[x] 2. Team understands acceptance criteria
[ ] 3. Team has defined solution / steps to satisfy acceptance criteria
[ ] 4. Acceptance criteria is verifiable / testable
[ ] 5. External / 3rd Party dependencies identified
[ ] 6. Ticket is prioritized and sized
Notes
I think there are plenty of ways to do this with the stdlib, instead of using a heavy dependency like dask. One idea might be something like concurrent.futures.ProcessPoolExecutor to launch and monitor the subprocesses.
One place you could easily introduce this parallelism is in the transform directory function. You could imagine building a list of transform specifications (file names, ffmpeg args) while crawling over the input directory, and then at the end dispatching those transforms in a parallel way if the job requests parallel encoding.
I also think doing this on a single node would be a fine place to start.
User story
As a user I want to see the results as soon as possible. Right now, each video is processed sequentially, with each one taking many hours to process. This can result in a delay of a day or more between doing an experiment and seeing the results.
Acceptance criteria
Sprint Ready Checklist
Notes
I think there are plenty of ways to do this with the stdlib, instead of using a heavy dependency like dask. One idea might be something like
concurrent.futures.ProcessPoolExecutor
to launch and monitor the subprocesses.One place you could easily introduce this parallelism is in the transform directory function. You could imagine building a list of transform specifications (file names, ffmpeg args) while crawling over the input directory, and then at the end dispatching those transforms in a parallel way if the job requests parallel encoding.
I also think doing this on a single node would be a fine place to start.