airavata-courses / TeamZenith

Team Zenith Repository for Spring 2016 I590 Class
2 stars 4 forks source link

Project design questions #3

Open anujbhan opened 8 years ago

anujbhan commented 8 years ago

As discussed over the hangout session, here is the summary :

  1. Develop a Java program to schedule the job, this program will do the following tasks * item 1a Create a SSH session * item 1b Transfer all the necessary files to remote host * item 1c Compile the input file at remote host * item 1d Schedule job using qsub Command * item 1e I have already pushed the code for the above tasks and it is working, For more information refer code. Suggest improvements if any.
  2. Monitoring the running jobs: * The design choice for this is still pending, @Arpit , Did you come up with any ideas..? * Some points to remember, The remote system wont allow any daemon to run more than 20 minutes, Cronjob will not work, we don't have access to cronTab file.
smarru commented 8 years ago

Good job guys in summerizing the discussions.

For monitoring one suggestion will be to consider call back hooks into PBS scripts. PBS runs each commands in sequene, so if you put a call back such as publishing to a queue or calling a service, you can inject them before and after the job. There are other approaches too, just trying to stir your thiking.

anujbhan commented 8 years ago

Thank you so much suresh , will have to explore callback hooks and pbs scripts.

arpiagariu commented 8 years ago

@smarru : Can we make a Qsub wrapper on karst system?

Qsub wrapper(submit filter) : When a "submit filter" exists, TORQUE will send the command file (or contents of STDIN if piped to qsub) to that script/executable and allow it to evaluate the submitted request based on specific site policies. The resulting file is then handed back to qsub and processing continues. Submit filters can check user jobs for correctness based on site policies

http://docs.adaptivecomputing.com/torque/4-0-2/help.htm#topics/12-appendices/jobSubmissionFilter.htm

smarru commented 8 years ago

I have not tried it on Karst myself. Give it a try and lets see how it goes.

If you have any questions, you can always email hps-admin@iu.edu but emails over weekend might be sparse. You should get quick responses during work hours.

Suresh

On Jan 23, 2016, at 7:39 PM, arpiagariu notifications@github.com wrote:

@smarru https://github.com/smarru : Can we make a Qsub wrapper on karst system?

Qsub wrapper(submit filter) : When a "submit filter" exists, TORQUE will send the command file (or contents of STDIN if piped to qsub) to that script/executable and allow it to evaluate the submitted request based on specific site policies. The resulting file is then handed back to qsub and processing continues. Submit filters can check user jobs for correctness based on site policies

http://docs.adaptivecomputing.com/torque/4-0-2/help.htm#topics/12-appendices/jobSubmissionFilter.htm http://docs.adaptivecomputing.com/torque/4-0-2/help.htm#topics/12-appendices/jobSubmissionFilter.htm — Reply to this email directly or view it on GitHub https://github.com/airavata-courses/TeamZenith/issues/3#issuecomment-174238796.

a2l007 commented 8 years ago

@arpiagariu For configuring the wrapper, the torque.cfg file would have to be modified and we don't have write access to that config file.

smarru commented 8 years ago

Yes torque.cfg is a system wide file. Any changes to it will impact all users on the system so they will not give access. I have not looked into details about the wrapper, but only user level modifications are allowed on these systems.

Suresh

On Jan 24, 2016, at 10:48 AM, Atul Mohan notifications@github.com wrote:

@arpiagariu https://github.com/arpiagariu For configuring the wrapper, the torque.cfg file would have to be modified and we don't have write access to that config file.

— Reply to this email directly or view it on GitHub https://github.com/airavata-courses/TeamZenith/issues/3#issuecomment-174311003.

a2l007 commented 8 years ago

@smarru When you say a callback such as publishing to a queue, there are many MOM APIs available such as ActiveMQ but wouldn't we have to configure that in Karst as well? We are pondering over the idea of using an SMTP based utility to notify the user upon job initiation and completion.

smarru commented 8 years ago

Atul, No, we are in a wrong direction. I hope I did not divert your thinking too much. Not that you cannot change or modify or intersect any of MOAB/Torque’s internal functionality or call backs.

I mean you can instrument your batch script with call backs.

As an example, your PBS script will look like this…

PBS…..

a.out

So before and after a.out you can call any shell command. So for instance there you can write any python or any other script and call to your server. It will look something like this:

PBS…..

call_back — before application execution begin.. a.out call_back — after application execution ends..

Suresh

On Jan 24, 2016, at 12:13 PM, Atul Mohan notifications@github.com wrote:

@smarru https://github.com/smarru When you say a callback such as publishing to a queue, there are many MOM APIs available such as ActiveMQ but wouldn't we have to configure that in Karst as well? We are pondering over the idea of using an SMTP based utility to notify the user upon job initiation and completion.

— Reply to this email directly or view it on GitHub https://github.com/airavata-courses/TeamZenith/issues/3#issuecomment-174320135.