Open tjk584 opened 2 years ago
Bash commands can be executed in the background in a separate process with &
at the end:
command1 &
command2 &
To wait for a background process to finish in the main process, use the wait
command:
command1 &
command2 &
wait
This could be used to run different dataflow scripts in parallel (although more logic would have to be implemented to ensure files aren't operated on at the same time by different scripts).
When executing several scripts in sequence on the same inotify daemon, later scripts cannot start until the first scripts finish executing. If later scripts are executing for a long time, they will prevent the first script from starting on time.
For example, on the site-linux computers, convert_and_restructure runs first, then rsync_to_campus runs. If rsync_to_campus has many files to transfer, it may take longer than 2 hours, and when the inotify daemon should be triggering again it can't because rsync_to_campus is still running. This blocks convert_and_restructure from executing until the next time inotify triggers (in 2 hours).
The main problem here is that having multiple dataflow scripts executing in succession within a single inotify script prevents them from executing in parallel. This isn't a problem for regular data flow, when all data is operated on in order and the amount of data is tolerable. For processing backlogs of data, where multiple dataflow scripts could be executing in parallel on different sets of data, this parallel computation is not possible.