adamkewley / jobson

A platform for transforming command-line applications into a job service.
Apache License 2.0
256 stars 20 forks source link

variable dependencies #42

Closed mtazzari closed 6 years ago

mtazzari commented 6 years ago

I cannot find a way to write a spec that has variable dependencies, i.e. dependencies that can be determined at the request submission time or at runtime. Example: I have a script (script.py) that I want to apply to a different file each time (let it be a table table_source_name.txt where source_name is the name of the astronomical source). I have written this spec but it doesn't work, I guess it's because the ${inputs.sourcename} doesn't get executed in the application/dependencies but only in the application/arguments. Jobson gives indeed fatal error that cannot find the file `/my/dir/tables/table${inputs.source_name}.txt`.

name: My first spec

description: >

  A demo job spec generated when creating a new Jobson deployment. Try it out!

expectedInputs:

- id: source_name
  type: string
  name: Source name
  description: Name of the astronomical source

execution:

  application: python
  arguments:
  - script.py
  - -v

  dependencies:
  - source: /my/dir/tables/table_${inputs.source_name}.txt
    target: table.txt
  - source: /my/dir/script.py
    target: script.py

expectedOutputs:
[...]

How can this be implemented in jobson? A solution would be including the table copy code inside script.py but I would rather use jobson to move dependencies.

adamkewley commented 6 years ago

Ah, I can see how that would be annoying if you're switching on a job input.

i've committed a patch to the master branch that adds templating to the job spec:

https://github.com/adamkewley/jobson/commit/53766138198b1a0b927cf16d220839f85f288535

The commit only has basic unit tests for the feature. I should probably add a systemtest to exercise it also. I'll try and do that ASAP, then publish 0.0.20 from the master branch.

Why it wasn't done before: the reason everything isn't automagically templated is because certain parts of the environment (e.g. request.inputs) are only available at certain points (e.g. after a job request), so the templating is done on a field-by-field case at the moment - a "cleaner" solution would split the spec.yml file into separate files (e.g. expected-inputs.yml, expected-outputs.yml), so that those files can be templated as a whole instead.