barbaLab / nigeLab

Neurophysiological data analyses by engineers, for physiologists.
GNU General Public License v3.0
14 stars 2 forks source link

Status not updated when jobs are deployed on workers #14

Closed StefanoBuccelli closed 4 years ago

m053m716 commented 5 years ago

Still working on this. Will continue today between rats, using a reduced environment to test new strategies. So far I have tried:

Next attempts:

m053m716 commented 5 years ago

Update

References #31

Problem

To run jobs in parallel on remote workers, the relevant files must be attached to the job. With the current doProcess organization for running extraction or filtering, etc., it is not very conducive to allowing the code to run on remote workers. Furthermore, because the data is on the disk (and not in memory), this will pose an even bigger problem with trying to attach data to a job and send it for remote manipulation.

On the Isilon cluster at KUMC, this second issue is less of a problem: the data is in a location where it can be manipulated both locally and by the remote cluster worker without needing to be attached to the job (just the locations that reference where the data is need to be updated according to whether the job is run locally or remotely). However, the first problem remains an issue, as it makes it difficult to run anything without essentially attaching the entire repository to a job.

Proposed Solution

Need to pare down any function (which should run as an independent function that is only passed data from the Block object, but which does not need to be a method of Block) that is to be passed on to remote workers. That way only the function itself needs to be submitted, with the relevant data/parameters. In particular, I think one thing that is causing issues is when helper functions are taking parameters from calls to nigeLab.defaults.Queue or nigelab.defaults.Notifications

Nabarb commented 5 years ago

The proposed workaround is to store the code on a location accessible both locally and remotely. This should be the case also because attaching big datafiles to jobs is impractical, hence you should already have a shared fodler or some sort of remote storage accessible both locally and remotely. This was addressed in 02a6ab40613e06d0cd68aa02f000563eaf0234a1 where a programmatically generated script is run at the beginning of each doMethod to add the nigeLab repo to the worker path.