Closed chu11 closed 9 months ago
OH I just realized the hostnames are the plugs. Well, then the hostname's index in the hostlist for that chassis?
Hmmmm, I suppose this is possible. Although ... we get into some hairy stuff b/c I think I've seen some systems where they begin to index at 1 instead of 0. So now we need a special config for that.
I'm wondering if a giant config of "these URIs for these nodes" is what is needed.
For a single chassis, I think this is fairly trivial - define plug names that are the index (0 or 1 origin, whatever), then do like you did in the httppower example and put the URI in the on/off script, but substitute the plug name using %s
.
Since that fits so naturally, I have to wonder how many redfishpower
instances we could run concurrently if it were one per chassis in a really big system. Example: 8 slot chasiss scaled out to 8K nodes would be 1024. Maybe I'll do a quick experiment just for fun.
Since that fits so naturally, I have to wonder how many redfishpower instances we could run concurrently if it were one per chassis in a really big system. Example: 8 slot chasiss scaled out to 8K nodes would be 1024. Maybe I'll do a quick experiment just for fun.
I was a little confused, until it occurred to me, i think you're recommending 1 chassis per redfishpower co-process? B/c I don't think we can specify a hostname and a plug on one powerman.conf line? i.e. something like
node "node1" "redfish1" "pnode1" "1"
can't be done? where the "1" is the "plug suffix" and "pnode1" is the hostname to power control.
Yes, but that is OK. You can specify a hostlist as before and just map the plugs in order. iow just specify
device "chassis0" "redfish" "redfishpower -h t[0-7] |&
node "t[0-7]" "chassis0"
device "chassis0" "redfish" "redfishpower -h t[0-7] |&
node "t[0-7]" "chassis0"
What a second, are you specifying the chassis parent here? For the actual URI wouldn't we want something more like
device "chassis0" "redfish" "redfishpower -h t[8-15] |&
node "t[8-15]" "[0-7]"
where 0-7 are the "plugs"?
No, chassi0 is the device name, and the plugs are unspecified in the node line. They are implicitly "[0-7]". So you can say
node "t[0-7]" "chassis0" "[0-7]"
node "t[8-15]" "chassis1" "[0-7]"
or equivalently
node "t[0-7]" "chassis0"
node "t[8-15]" "chassis1"
I had another idea about the chassis address but wanted to get this point across first.
ohhh got it got it ... i was getting confused, yeah, all the URIs goto the same chassis.
Ugh ... maybe my prototype for #81 is a waste now ... maybe this has to be solved first.
Since the URI for the chassis power control is probably different from the slots, my thought was to have a special plug name c
or something that is just mapped to a different URI than the rest of the plugs in redfishpower. If it's the last plug, e.g. "0", "1", "2", ... "7", "c" then
node "t[0-7],chassis0" "chassis0"
node "t[0-7],chassis1", "chassis1"
or equivalently
node "t[0-7],chassis0" "chassis0" "[0-7,c]"
node "t[0-7],chassis1", "chassis1" "[0-7,c]"
Maybe the "setconfig" stuff at the beginning of the device script could set the config for that special plug, including the hierarchical semantics.
If the URI is different for each blade, are we only talking to the chassis (one IP)?
Do we have an El Cap chassis to poke at? Because if we're only talking to the chassis, we don't care what nodes are in there!
If the URI is different for each blade, are we only talking to the chassis (one IP)?
Of the one example I've seen yeah, the host is the same for each of the blades, just the suffix "path" is different (0, 1, 2, .., etc. different in each path).
For that type of a chassis I wouldn't think the hierarchical semantics we discussed would be required... The chassis probably remains responsive to queries about the nodes even when off (if it can even be turned off).
as we go around in circles on some of this stuff, I'm beginning to think "mega-config file" is the right idea, because there's so many oddball cases with redfish.
i can't help but look at the proliferation of device files as evidence for the need.
On the first three items - I think we are zeroing in on how to do this simply without a separate config file. It seems like we have identified two cases that we may care about (but we should verify they really exist):
Set vs plugs isn't an either or thing. You can set a URI template and then still substitute plugs.
On the last two items - this is what powerman does best. You can mix and match different schemes in one config. The device scripts provide the abstraction, and then you map "plugs" in each device to hostnames in the main config and powerman provides one interface to the admins.
It would feel like a design failure if we have to introduce a second config file so I think we should keep trying. Let's start by finding out exactly what we're dealing with in El Cap.
On the last two items - this is what powerman does best.
The point on the last two items was the potential explosion of device specifications. Unlike previous device files in powerman, it seems that copy & modify the device files is going to a common pattern with redfish and some of these REST interfaces, as there are quirks in every system. And with blades and parents, we might be introducing additional quirks too. So perhaps a mega config just might be easier overall?
Bullet 3 above is the one that made me go "ugh" the most ... where we are crossing the line into different URI configs for different hosts within a single redfishpower process, so there was this ... "ugh ..."
I'm not convinced a new config file is the answer, particularly to this issue. If we could stay focused on this issue, let's look at what the admins had to do on hetchy with the following device script:
redfishpower-cray-olympus-blades.dev
This is apparently for an 8-blade chassis. They cut and pasted the same specification with all its scripts 8 times within the same .dev
file and gave each spec's name a suffix like -blade0
, -blade1
, etc. and they (only) alter the URIs in each one, e.g.
send "setonpath redfish/v1/Chassis/Blade0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n
send "setonpath redfish/v1/Chassis/Blade1/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Blade2/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
...
Then their config looks like this:
device "redfishpower-blade0" "redfishpower-cray-olympus-blade0" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"
device "redfishpower-blade1" "redfishpower-cray-olympus-blade1" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"
device "redfishpower-blade2" "redfishpower-cray-olympus-blade2" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"
device "redfishpower-blade3" "redfishpower-cray-olympus-blade3" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"
### Login/Compute Blades
node "hetchy-blade1" "redfishpower-blade0" "hetchy-cmm1"
node "hetchy-blade2" "redfishpower-blade1" "hetchy-cmm1"
node "hetchy-blade3" "redfishpower-blade2" "hetchy-cmm1"
node "hetchy-blade4" "redfishpower-blade3" "hetchy-cmm1"
node "hetchy-blade5" "redfishpower-blade0" "hetchy-cmm2"
node "hetchy-blade6" "redfishpower-blade1" "hetchy-cmm2"
So I guess they have two blade chassis, one with 4 blades installed and one with 2. They really had to stand on their heads to get this set up.
IMHO there should have been one device spec for this particular chassis with 8 plugs defined. Then their config would be more intuitive, like this
device "cmm1" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm1 |&"
device "cmm2" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm2 |&"
### Login/Compute Blades
node "hetchy-blade[1-4]" "cmm1" "[0-3]"
node "hetchy-blade[5-6]" "cmm2" "[0-1]"
Incidentally they have a separate dev specification in another .dev
file for the chassis itself:
redfishpower-cray-olympus-cmm.dev
It's another cut & paste, identical to the blades except for the URIs e.g.
send "setonpath redfish/v1/Chassis/Blade0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
Their config is:
device "redfishpower-cmm" "redfishpower-cray-olympus-cmm" "/usr/sbin/redfishpower -h hetchy-cmm[1-2] |&"
### CMMs
node "hetchy-cmm1" "redfishpower-cmm" "hetchy-cmm1"
node "hetchy-cmm2" "redfishpower-cmm" "hetchy-cmm2"
Ideally we would figure out a way to represent the chassis as another plug like c
in the single .dev spec proposed above. Then they would not have any new devices, just a node config for the chassis, e.g.
### CMMs
node "hetchy-cmm1" "cmm1" "c"
node "hetchy-cmm2" "cmm2" "c"
Or even combined with the blades, e.g.
node "hetchy-blade[1-4],hetchy-cmm1" "cmm1" "[0-3,c]"
node "hetchy-blade[5-6],hetchy-cmm2" "cmm2" "[0-1,c]"
And the entire blade config, with all its internal cut & paste, is cut and paste to another .dev
script for the switches
redfishpower-cray-olympus-switches.dev
In this one the URIs are like
send "setonpath redfish/v1/Chassis/Perif0/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Perif1/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
send "setonpath redfish/v1/Chassis/Perif2/Actions/Chassis.Reset {\"ResetType\":\"On\"}\n"
...
There doesn't seem to be chassis control for this one - not sure if that was just an omission or if there really isn't a capability. Anyway, 8 specs could be reduced to 1 with plugs.
So in summary I think the path forward is:
%s
substitution in the URIs within redfishpowersetonpath
type command, to associate a special plug with a different URI for chassis supportset
command to establish a parent plug) but first check and see if that actually helps with the El Cap stuff and defer if not.Edit: look at all the cut & paste this fixes! Does it go a little ways to address your concern
it seems that copy & modify the device files is going to a common pattern with redfish and some of these REST interfaces, as there are quirks in every system. And with blades and parents, we might be introducing additional quirks too
hmmmm, I guess it's just a difference of opinion. In my mind, writing out something like the following would be easier? Now everything is in one place, vs multiple .dev files?
# not blades
[login]
login.hosts = nodes[0-7]
login.auth = ...
login.statpath = ...
login.onpath = ...
[blade]
blade.hosts = nodes[8-1024]
blade.auth = ...
blade.parent = chassis[0-63]
blade.statpath = ...%s...
blade.onpath = ...%s...
blade.chassisstatpath = ...
[chassis]
chassis.hosts = chassis[0-63]
chassis.auth = ...
chassis.statpath = ...
chassis.onpath = ...
[gateway]
gateway.hosts = other_node_type[0-7]
gateway.auth = ...
gateway.statpath = ...
gateway.onpath = ...
look at all the cut & paste this fixes! Does it go a little ways to address your concern
Yeah. I guess here are just a few concerns:
would this approach lead to an unnecessary number of redfishpower co-procs on the system? In my mind, 16-64 is ok, but possibly 1000s?
I am also trying to think of systems that we haven't seen yet. Maybe this is me thinking too far ahead for imaginary scenarios we haven't witnessed, but I'm thinking more flexibility would be wise to engineer in now vs later. BUT ... I guess in the worst case, if there are strange systems that arrive in the future, admins could do what they are doing right now (i.e. -h node[0,8,16,32,...]
is one kooky config, -h node[1,9,17,33]
is another kooky config).
I'm not sure there is a problem with 1-2K coprocs or why we need to invest effort or add complexity to avoid it. See #127 - 2048 coprocs (for a fictitions 16K node system) even works in the tiny ci environment.
I am also trying to think of systems that we haven't seen yet. Maybe this is me thinking too far ahead for imaginary scenarios we haven't witnessed, but I'm thinking more flexibility would be wise to engineer in now vs later. BUT ... I guess in the worst case, if there are strange systems that arrive in the future, admins could do what they are doing right now (i.e. -h node[0,8,16,32,...] is one kooky config, -h node[1,9,17,33] is another kooky config).
I'm not sure what you're referring to here. I'd say let's stay focused on use cases we have in front of us (or that we at least can find extant somewhere).
redfishpower
is essentially a powerman plugin, so it really should behave like one, not go too far off in the weeds doing its own thing (only as necessary to meet specific objectives).
redfishpower is essentially a powerman plugin, so it really should behave like one, not go too far off in the weeds doing its own thing (only as necessary to meet specific objectives).
Good point. In my mind I might have been thinking of it like a separate utility.
Specific issues now open (#128 and #129) so let's close this one.
per conversation in #81,