OSC / osc-machete

High level interface to submitting and checking the status of batch jobs (currently OSC specific)
MIT License
1 stars 0 forks source link

Implement PBS-Ruby into Machete at least for qstat #26

Closed nickjer closed 8 years ago

nickjer commented 9 years ago

You can easily implement the same features of qstat using pbs-ruby in Machete and remove all references to XML. An example can be seen in the pbs-ruby repo examples directory.

I don't suggest removing qsub yet, as you will then be required to programmatically setup a batch job. One of the benefits of submitting jobs through pbs-ruby though would be the raising of errors when job submission fails.

ericfranz commented 8 years ago

This will reduced duplication and add more verbose error handling

ericfranz commented 8 years ago

This would be an update to the TorqueHelper gem, which just shells out for everything right now. The interface of course shouldn't change.

brianmcmichael commented 8 years ago

The interface of course shouldn't change.

We currently have a qstat_xml and a 'parse_qstat_output' in the public methods.

Is the goal to completely remove the reliance on Nokogiri here, in which these methods will be troublesome, or should we leave them there and deprecate them so that they don't break?

brianmcmichael commented 8 years ago

Don't break the interface, I mean.

ericfranz commented 8 years ago

Clarification: don't break the interface that OSC::Machete::Job uses :-). qstat_xml can go

brianmcmichael commented 8 years ago

Note to self, PBS examples here:

https://github.com/AweSim-OSC/pbs-ruby/blob/master/examples/simplejob.rb

nickjer commented 8 years ago

Open up the debate whether the host should also be a property of OSC::Machete::Job and stored in ActiveRecord.

The pbsid may not provide enough information to determine host. Biggest pro is that it would reduce the code used to determine the host.

ericfranz commented 8 years ago

I like the idea of host being a property of OSC::Machete::Job. You could also imagine that Script is an object that knows its "host". This is more object oriented and would definitely reduce complexity from the TorqueHelper.

Also, knowing the "host" alongside the "id" would seem necessary if we wanted to support multiple resource managers (besides Torque). When I submit a job, I need to know what host I'm submitting to. When I check the status of the job, I need to know what host its on and the id of the job.

ericfranz commented 8 years ago

So, if the host is not available and you want to do a qstat, we could try to fall back on determining the host from the pbsid (so the older apps work). For Ruby, which doesn't append the host to the id, we could append the host ourselves so its always there. Not sure what object would be responsible for this "mapping". Probably a pbsid object...

OR for the older apps, we could just have the app hardcode the system it uses. Most of the apps are probably running everything on Oakley anyways. And individual jobs might be designed to run on specific systems (i.e. solve on Ruby node, post process on Oakley node).

So we could have host be a required argument for OSC::Machete::Job, default it to Oakley if not provided, and then the database wouldn't need an extra column. We could use a factory method to set the default... some ideas...

kmanalo commented 8 years ago

This may be a silly question as I'm not clear on the underpinnings, but is 'submit_host' not an attribute available with each job id with 'qstat -x'? Were you implying falling back to 'qstat' to query the associated submit host? I'm only interpreting this as someone who uses 'qstat -x' and nothing else.

On Tue, Dec 29, 2015 at 2:14 PM, Eric Franz notifications@github.com wrote:

So, if the host is not available and you want to do a qstat, we could try to fall back on determining the host from the pbsid (so the older apps work). For Ruby, which doesn't append the host to the id, we could append the host ourselves so its always there. Not sure what object would be responsible for this "mapping". Probably a pbsid object...

OR for the older apps, we could just have the app hardcode the system it uses. Most of the apps are probably running everything on Oakley anyways. And individual jobs might be designed to run on specific systems (i.e. solve on Ruby node, post process on Oakley node).

So we could have host be a required argument for OSC::Machete::Job, default it to Oakley if not provided, and then the database wouldn't need an extra column. We could use a factory method to set the default... some ideas...

— Reply to this email directly or view it on GitHub https://github.com/AweSim-OSC/osc-machete/issues/26#issuecomment-167856179 .

brianmcmichael commented 8 years ago

These changes have been merged into machete release/1.0 for qsub, qstat, and qdel.

@kmanalo We got away from using the shell commands with this update. Now we're calling the Torque C libraries directly via the pbs gem.

@nickjer If we want to make TorqueHelper accept a host it should probably be a separate issue. Re-open if I'm wrong.

brianmcmichael commented 8 years ago

~16 hours coding. Most of that digging through machete and PBS source to ensure that the API stayed the same. ~24 hours building tests. Developing live and mock tests of the system. Partially blocked in the middle. Most of that time learning how Mocha tool works for mimicking private classes.