dchackett / taxi

Lightweight portable workflow management system for MCMC applications
MIT License
3 stars 1 forks source link

Better taxi name registration/pool management #14

Closed dchackett closed 6 years ago

dchackett commented 7 years ago

Currently, taxi names are registered using subfolders in a "taxi-pool" folder. Taxis are named like "{taxi-tag}{number}" (e.g., "mrep0" or "fund-147"). A taxi name is marked "taken" when a folder with the name of that taxi exists in taxi-pool, e.g., if "taxi-pool/mrep0/" exists, then the name "mrep0" is taken.

This is ad-hoc, unaesthetic, and results in a proliferation of (frequently empty) subfolders. Further, to avoid name collisions, you must have one "taxi-pool" folder for any given cluster you'd like to run on. This means all log files are in a centralized location which, by virtue of being centralized and thus having oodles of subfolders, is difficult to find anything in.

A better solution is: keep using the "{taxi-tag}{number}" naming scheme. However, do not pre-assign the needed taxi names when the dispatch DB is created. Instead, whenever a new taxi name is needed, check the queue to find an available taxi name of the correct format (i.e., the queue is used as the registry, instead of a folder). Respawning taxis continue to retain their names.

Aside from being more aesthetically pleasing, this allows for local (versus centralized) storage of log folders.

We could also put in some provision that a given taxi name is not abandoned and then reused when working through the same dispatch DB. This would require using the folder-registry scheme secondarily, except now the names are registered in a local (versus centralized) folder. Might make it easier to find the right log file.

Log folders could be subfolders of work folders, which spares the user having to specify another directory in each run-specification script. Logs could also be stored in one large pool folder like work/logs/, instead of in work/mrep0/ or work/logs/mrep0/.

dchackett commented 6 years ago

The separation of Dispatcher and Pool, and the Pool superclass, provide a graceful/flexible/easily modified solution to this problem.