Closed nickjer closed 8 years ago
This will reduced duplication and add more verbose error handling
This would be an update to the TorqueHelper gem, which just shells out for everything right now. The interface of course shouldn't change.
The interface of course shouldn't change.
We currently have a qstat_xml
and a 'parse_qstat_output' in the public methods.
Is the goal to completely remove the reliance on Nokogiri here, in which these methods will be troublesome, or should we leave them there and deprecate them so that they don't break?
Don't break the interface, I mean.
Clarification: don't break the interface that OSC::Machete::Job uses :-). qstat_xml
can go
Note to self, PBS examples here:
https://github.com/AweSim-OSC/pbs-ruby/blob/master/examples/simplejob.rb
Open up the debate whether the host should also be a property of OSC::Machete::Job
and stored in ActiveRecord
.
The pbsid
may not provide enough information to determine host. Biggest pro is that it would reduce the code used to determine the host.
I like the idea of host being a property of OSC::Machete::Job. You could also imagine that Script is an object that knows its "host". This is more object oriented and would definitely reduce complexity from the TorqueHelper.
Also, knowing the "host" alongside the "id" would seem necessary if we wanted to support multiple resource managers (besides Torque). When I submit a job, I need to know what host I'm submitting to. When I check the status of the job, I need to know what host its on and the id of the job.
So, if the host is not available and you want to do a qstat, we could try to fall back on determining the host from the pbsid (so the older apps work). For Ruby, which doesn't append the host to the id, we could append the host ourselves so its always there. Not sure what object would be responsible for this "mapping". Probably a pbsid
object...
OR for the older apps, we could just have the app hardcode the system it uses. Most of the apps are probably running everything on Oakley anyways. And individual jobs might be designed to run on specific systems (i.e. solve on Ruby node, post process on Oakley node).
So we could have host be a required argument for OSC::Machete::Job, default it to Oakley if not provided, and then the database wouldn't need an extra column. We could use a factory method to set the default... some ideas...
This may be a silly question as I'm not clear on the underpinnings, but is 'submit_host' not an attribute available with each job id with 'qstat -x'? Were you implying falling back to 'qstat' to query the associated submit host? I'm only interpreting this as someone who uses 'qstat -x' and nothing else.
On Tue, Dec 29, 2015 at 2:14 PM, Eric Franz notifications@github.com wrote:
So, if the host is not available and you want to do a qstat, we could try to fall back on determining the host from the pbsid (so the older apps work). For Ruby, which doesn't append the host to the id, we could append the host ourselves so its always there. Not sure what object would be responsible for this "mapping". Probably a pbsid object...
OR for the older apps, we could just have the app hardcode the system it uses. Most of the apps are probably running everything on Oakley anyways. And individual jobs might be designed to run on specific systems (i.e. solve on Ruby node, post process on Oakley node).
So we could have host be a required argument for OSC::Machete::Job, default it to Oakley if not provided, and then the database wouldn't need an extra column. We could use a factory method to set the default... some ideas...
— Reply to this email directly or view it on GitHub https://github.com/AweSim-OSC/osc-machete/issues/26#issuecomment-167856179 .
These changes have been merged into machete release/1.0
for qsub
, qstat
, and qdel
.
@kmanalo We got away from using the shell commands with this update. Now we're calling the Torque C libraries directly via the pbs
gem.
@nickjer If we want to make TorqueHelper accept a host it should probably be a separate issue. Re-open if I'm wrong.
~16 hours coding. Most of that digging through machete and PBS source to ensure that the API stayed the same. ~24 hours building tests. Developing live and mock tests of the system. Partially blocked in the middle. Most of that time learning how Mocha tool works for mimicking private classes.
You can easily implement the same features of
qstat
usingpbs-ruby
in Machete and remove all references to XML. An example can be seen in thepbs-ruby
repo examples directory.I don't suggest removing
qsub
yet, as you will then be required to programmatically setup a batch job. One of the benefits of submitting jobs throughpbs-ruby
though would be the raising of errors when job submission fails.