cooperative-computing-lab / cctools

The Cooperative Computing Tools (cctools) enable large scale distributed computations to harness hundreds to thousands of machines from clusters, clouds, and grids.
http://ccl.cse.nd.edu
Other
134 stars 116 forks source link

Test tools for tutorial at XSEDE #78

Closed btovar closed 11 years ago

dthain commented 11 years ago

Also @psempoli

dpandiar commented 11 years ago

Peter and I met this afternoon to hash out a plan for the tutorial and to-do list:

  1. Run through the existing tutorial instructions to determine the parts that need change to work on XSEDE (e.g., options for sge_submit_workers, Makeflow -T sge etc).
  2. Test a Blast run on XSEDE (using Blast for Makeflow demo).
  3. Test WQ hierarchy for running replica exchange (using this for WQ demo). Test a setup where a foreman is run on the ND CRC head node working for a master running in XSEDE. Start workers for this foreman in the CRC cluster.
  4. Test asynchronous data transfer in WQ.
dthain commented 11 years ago

Re 3 - Make sure that the tutorial setups match a realistic use case, otherwise the attendees are going to be unclear on why to use them. Does it generally make sense to have workers in CRC and a foreman in XSEDE?

Re 4 - Asynchonrous data movement is not yet well tested or documented, please leave it out. However, multi-slot workers should be worked in appropriately.

dpandiar commented 11 years ago

For 3, we were planning on starting with two pools of workers managed by two foreman for the demo: One foreman on the Lonestar head node managing workers in Lonestar (including those started by audience using the resources allocation for the tutorial). The other on the CRC head node managing workers running there. This matches a use case where there are multiple worker pools in different clusters all managed by their own foreman.

dpandiar commented 11 years ago

We also came up with a list of goals for the demo to make sure our testing covers them: a. The tools for running and monitoring WQ/MF apps - catalog server, work_queue_status, submit_worker scripts. b. Elasticity c. Aggregation of resources across clusters d. Hierarchy in WQ - lower transfer overheads at master, isolate failures within clusters

And the multi-slot workers will be used in the hands-on instruction part to show how the allocated cores in Lonestar can be maximally utilized.

dthain commented 11 years ago

Ok, sounds good.

On Mon, Jun 17, 2013 at 4:42 PM, Dinesh Rajan notifications@github.comwrote:

We also came up with a list of goals for the demo to make sure our testing covers them: a. The tools for running and monitoring WQ/MF apps - catalog server, work_queue_status, submit_worker scripts. b. Elasticity c. Aggregation of resources across clusters d. Hierarchy in WQ - lower transfer overheads at master, isolate failures within clusters

And the multi-slot workers will be used in the hands-on instruction part to show how the allocated cores in Lonestar can be maximally utilized.

— Reply to this email directly or view it on GitHubhttps://github.com/cooperative-computing-lab/cctools/issues/78#issuecomment-19577041 .

dpandiar commented 11 years ago

4.0rc1 was tested for tutorial. Marking this closed.