cylc / cylc-flow

Cylc: a workflow engine for cycling systems.
https://cylc.github.io
GNU General Public License v3.0
329 stars 93 forks source link

Job submission problem (slurm, and maybe others) #930

Closed m214089 closed 10 years ago

m214089 commented 10 years ago

Hi Hilary,

slurm does have quite a few directives without arguments. I do not get them from suite.rc to the final job script. Is there a way?

    --exclusive = ' ' 

is rendered to

   --exlusive=

but must be

    --exclusive

saw the comment in the manual on sge ... I wonder if the sge example does still work. Cheerio, Luis

matthewrmshin commented 10 years ago

@m214089 do other directives in SLURM require an = sign between the key and the value? If not, you can probably change the connector from = to at:

https://github.com/cylc/cylc/blob/master/lib/cylc/job_submission/slurm.py#L32

like in PBS:

https://github.com/cylc/cylc/blob/master/lib/cylc/job_submission/pbs.py#L33

m214089 commented 10 years ago

Hi Matt, no all options with argument are of the type

--option = value

That doesn't help. Thanks anyway

matthewrmshin commented 10 years ago

@m214089 I wonder if you can do:

hjoliver commented 10 years ago

(I'll take a look at this tomorrow - I was away today...)

By the way @m214089 - if you're using slurm, that is the only one of our job submission methods that does not yet support job polling and killing (from CLI and GUI) - which has proved to be very useful - simply because the core developers don't have access to slurm. Perhaps you could look at completing lib/cylc/job_submission/slurm.py by comparision with pbs.py. It should be easy - we just need to parse out the job ID, and know how to query and kill the job by its ID, using the slurm equivalents of qstat and qdel. Job query only needs to know if the job is present in the queuing system or not, we don't need to know about specific job states.

m214089 commented 10 years ago

Ok, so I'll start doing it by taking pbs as example.

Cheerio, Luis

m214089 commented 10 years ago

Hilary,

I haven't used the test-battery yet. Is there an HOWTO and are tests for the job submission included?

I've done all the changes, but like to do proper testing.

Thanks, Luis

m214089 commented 10 years ago

Hi Matt, your suggestion would work, but breaks orthogonality and would create many mistakes in suite.rc files. I fixed it in jobfile for the time being by checking if the argument is an empty string leading to a write without connector and value. This is the behaviour which can be expected fromthe examples in the manual. So it does not introduce any problem with compatibility with existing implementations. Thanks, Luis

hjoliver commented 10 years ago

test battery

The HOWTO, such as it is (well, the tests are mainly for developer use) is just cylc test-battery --help. It tests hundreds of things at this point, including background, at, and loadleveler job submission. So it would be good to have tests for slurm too. You'll need to specify a slurm host under [test battery] in your site or user config file (run cylc get-site-config to see all items), and the tests will need to be omitted automatically if a slurm host is not specified. You should be able to base your kill test on tests/job-kill/02-loadleveler.t (which runs a suite that submits a job to loadleveler, and then submits another job to kill the first one - and the test should fail if the job kill fails). Let us know if you have trouble with the test framework - it's not exactly transparent, but it's usually sufficient to copy and modify an existing test).

hjoliver commented 10 years ago

Currently we assume the form PREFIX KEY <CONNECTOR> VALUE for all directives, and from what you say slurm does not conform to this. It should be easy enough to fix though. Each job submission method overrides set_directives() to set the prefix and connector; and then the final string is generated and written out in jobfile.py using the key-value pairs. I guess instead we should just generate the final string for each directive in the method-specific set_directives() - which in the slurm case would not use the = connector for items with a null value. Then just write out each final string in turn in jobfile.py.

m214089 commented 10 years ago

Ok, that sounds the best way to do and generates u huge amount of flexibility to custom tailor. FOr testing I had been just hacking jobfile directly, which is not the right way to do from a good programming praxis point of view.

hjoliver commented 10 years ago

@m214089 - the hacking is good - to give you a working temporary solution, in case the full solution takes a while to review and get merged to master.

hjoliver commented 10 years ago

closed by #938