IBM / CAST

CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27 stars 34 forks source link

CSM lacks collective commands #222

Open fpizzano opened 6 years ago

fpizzano commented 6 years ago

Summary: Another issue with CSM that we have noticed. I assume that this has already been logged, but perhaps worthwhile for LLNL to log as well.

CSM lacks collective commands, such as the ability to specify sierra[100-360] -r y, to make ready a large number of nodes. Without such collective commands, we will have to wrap scripts CSM API binaries, write SQL scripts, etc This may have performance implications as well, if we have to make ready many nodes.

It''s nnclear if we are missing something, or if this functionality indeed doesn''t exist.

Is your feature request related to a problem? Please describe. Problem Report number: 37974

ToDo:

fpizzano commented 6 years ago

Adding @adambertsch @watson6282

mew2057 commented 6 years ago

A high level assessment of this feature makes me think this could be a achieved through a multi phase process on the master daemon:

  1. Check if xCAT is running on the node.
  2. If xCAT is present query the group/range from xcat.
  3. Continue to use this generated range in the rest of the API.

I'm not sure if this is an acceptable mechanism as it will likely add a database call to every query with an xCAT range or group. It might be best to limit it just to range and look at xCAT's range specifiers re-implementing them in C++.

NickyDaB commented 6 years ago

@gurevichmark sent me this. posting it here for later reference.

https://xcat-docs.readthedocs.io/en/stable/advanced/restapi/restapi_resource/restapi_reference.html#uri-nodes-noderange-nodels-lists-the-nodes-noderange-cannot-start-with

NickyDaB commented 5 years ago

I read the above link in more detail. seems promising.

NickyDaB commented 5 years ago

More info. For python: https://xcat-docs.readthedocs.io/en/stable/advanced/restapi/restapi_usage/restapi_usage.html#an-example-of-how-to-use-xcat-rest-api-from-python

NickyDaB commented 5 years ago

Example of api in action in python.

https://github.com/xcat2/xcat-core/blob/master/xCAT-server/xCAT-wsapi/xcatws-test.py

NickyDaB commented 5 years ago

@pdlun92

@gurevichmark says we have an old version of xcat

[root@c650mnp02 nbuonar]# lsxcatd -a
Version 2.13.7

still looking into it, but we may need to upgrade to get the noderange support.

NickyDaB commented 5 years ago
[root@c650mnp02 nbuonar]# curl -X GET -k 'https://127.0.0.1/xcatws/nodes/all/nodels?userName=root&userPW=ppslab&pretty=1'
{
   "errorcode":"2",
   "error":"Unspported resource."
}

we should try this command again after we do the xcat upgrade.

NickyDaB commented 5 years ago

@gurevichmark did a test with xcat 2.14.6 and it looks like the commands worked.

NickyDaB commented 5 years ago

@pdlun92 @fpizzano I think we should try to update xcat to test before we tell the labs this will work. but this path still looks good so far.

NickyDaB commented 5 years ago

we updated xcat and made some progress.

I will probably need to talk to @gurevichmark again tomorrow

NickyDaB commented 5 years ago

We have a working example of combining the xCAT restful APIs via python and CSM python APIs to use an xCAT node range in CSM.

NickyDaB commented 5 years ago

Next to do item is to bring this up with the labs on the next phone call and ask them if this is how they plan on implementing the feature.

I think its good because:

  1. they said they will be using CSM python APIs
  2. they can import xcat python apis very easy.