dchackett / taxi

Lightweight portable workflow management system for MCMC applications
MIT License
3 stars 1 forks source link

AuxJobs necessary? #4

Closed dchackett closed 6 years ago

dchackett commented 7 years ago

It is possible to eliminate "HMCAuxJobs" entirely, and replace them with "FileJobs". The functionality that AuxJobs provide can be replicated by convenience functions. Compare "spectro_jobs_for_hmc_jobs" (AuxJob) with "flow_jobs_for_hmc_jobs" (FileJob) for implementations of each scheme.

Is this a good idea, structurally? It reduces the number of classes required, but in an abstract sense, it's nice to have a specific species of task (AuxJob) that abstractly represents "run X on the output of Y in the sensible way".

It might also/alternately make sense to restructure such that AuxJobs are subclasses of FileJobs, in some way.

etneil commented 7 years ago

I do like the AuxJob idea of being a job that is linked to a particular HMCJob, but I suspect as you do that it should be a subclass of FileJob with a minimal variation.

I also wonder about FileJobs themselves. For example, for the Wilson flow, is there such a thing as a FlowJob that isn't a FileFlowJob? Is that layer of abstraction really necessary?

dchackett commented 7 years ago

Given a certain reading of what a "FileJob" should be, there are no FlowJobs that are not FileFlowJobs. I was thinking about a slightly different reading of the terminology. The practical issue here is recovering parameters of the gauge configuration that we're making a measurement on. For Wilson Flow, this is the volume of the gauge configuration. For spectroscopy, we also need the relevant kappa coupling.

The practical distinction between FileXJobs and AuxXJobs is that FileXJobs parse these parameters out of the filename, while AuxXJobs read these parameters out of the associated gauge-generating task. I didn't like the structure of having AuxXJobs as subclasses of FileXJobs, because these parameter-retrieving schemes are unrelated. The XJob abstract superclasses know how to run the relevant binary, which is common to FileXJobs and AuxXJobs.

The current structure is ungainly, especially considering users will need to make their own Task classes. As a design feature, they should really only have to make one Task class per binary/runner script.

So, how can we implement the procedure of getting parameters to run on a given gauge configuration? This interacts with #3, #21. Ideas:

  1. Everything is mediated by file naming conventions. A gauge configuration is saved with a file name that contains all information relevant to make measurements on it. This would completely remove the need for AuxXJobs, and so all XJobs would be exactly FileXJobs. I am hesitant to rely on this approach, as people might want to use their own file-naming conventions that are not sufficiently informative.

  2. Alongside the GaugeGenerator superclass, we have a GaugeMeasurement superclass. GaugeMeasurement has the functionality of both an AuxJob and a FileJob. GaugeGenerators have some sort of params dict (or subobject?), which contains things like ns, nt, beta, etc. Each GaugeMeasurement subclass has a list of parameter names it needs to run. A GaugeMeasurement hooked to a GaugeGenerator pulls relevant parameters out of the GaugeGenerator params (which requires consistent parameter naming between the two classes). A GaugeMeasurement pointed at a file uses a filename convention to extract parameters (whatever file convention method thus also needs to return a consistently-named params dict). The only potential downside to this is having to consistently name parameters across all Task objects in a project, but that might actually be a feature.

We should probably implement idea 2, unless I am missing some glaring structural issue.

etneil commented 7 years ago

I like option 2 a lot, and I think as long as we implement things properly, the naming conventions should only be specified at a single point and there should be no possibility of conflict. (The user would have to specify both how to read and how to write a gauge filename, but they can be right next to each other and that's very easy to test.)