Closed StefanoBuccelli closed 4 years ago
References #31
To run jobs in parallel on remote workers, the relevant files must be attached to the job. With the current doProcess
organization for running extraction or filtering, etc., it is not very conducive to allowing the code to run on remote workers. Furthermore, because the data is on the disk (and not in memory), this will pose an even bigger problem with trying to attach data to a job and send it for remote manipulation.
On the Isilon cluster at KUMC, this second issue is less of a problem: the data is in a location where it can be manipulated both locally and by the remote cluster worker without needing to be attached to the job (just the locations that reference where the data is need to be updated according to whether the job is run locally or remotely). However, the first problem remains an issue, as it makes it difficult to run anything without essentially attaching the entire repository to a job.
Need to pare down any function (which should run as an independent function that is only passed data from the Block object, but which does not need to be a method of Block) that is to be passed on to remote workers. That way only the function itself needs to be submitted, with the relevant data/parameters. In particular, I think one thing that is causing issues is when helper functions are taking parameters from calls to nigeLab.defaults.Queue
or nigelab.defaults.Notifications
The proposed workaround is to store the code on a location accessible both locally and remotely. This should be the case also because attaching big datafiles to jobs is impractical, hence you should already have a shared fodler or some sort of remote storage accessible both locally and remotely. This was addressed in 02a6ab40613e06d0cd68aa02f000563eaf0234a1 where a programmatically generated script is run at the beginning of each doMethod to add the nigeLab repo to the worker path.
Still working on this. Will continue today between rats, using a reduced environment to test new strategies. So far I have tried:
Create property listener handle in DashBoard, attached to Communicating Job Server Object 'Tag' property on 'PostSet'.
Create event listener handle in DashBoard, attached to Block object (target). Add 'channelCompleteEvent' and 'processCompleteEvent' to Block, which are triggered during 'do' methods on completion of channel loop and at the end of the method, respectively.
Create parallel.Data.Queue object with 'afterEach' method specified as 'updateRemoteMonitor' method of DashBoard class. Add UserData property (public, hidden) to Block. Add the parallel.Data.Queue object and queue index to the Block UserData struct, and then pass the queue index and completion percentage as struct elements to the updateRemoteMonitor method via 'send' function in 'do' methods.
Next attempts: