Open chu11 opened 1 month ago
We do have aliases so blade1477 and blade1477 could just map to the same blade name.
I thought the admins told us that they had other ways to do this sort of mapping and that we didn't need to provide support for such stuff in powerman?
We do have aliases so blade1477 and blade1477 could just map to the same blade name.
That was my first thought. But behind the scenes I didn't know the fallout. For example, if someone did pm -0 blade1477,blade1478
I'm not sure if that would send 1 or 2 power control requests to redfishpower (i.e. does the ranged script call hostlist_uniq()
and pass off Blade3
or off Blade[3,3]
to redfishpower?).
I thought the admins told us that they had other ways to do this sort of mapping and that we didn't need to provide support for such stuff in powerman?
I think they do, but this is more of a convenience. vs pm -0 $(get-me-the-blade.sh node1478)
comments @watson6282
FWIW
$ pm -T -1 picl1,picl1
send(picl): 'on 1\n'
recv(picl): 'OK\n'
recv(picl): 'power> '
Command completed successfully
Two hosts named, one command issued.
as a quick test
listen "localhost:11099"
include "/g/g0/achu/chaos/git/powerman/etc/devices/redfishpower-cray-ex-rabbit.dev"
device "d0" "cray-ex-rabbit" "/g/g0/achu/chaos/git/powerman/src/redfishpower/redfishpower -h cmm0,t[0-15],rabbit --test-mode|&"
node "cmm0,perif[0-4,7],blade[0-7],t[0-15],rabbit" "d0" "Enclosure,Perif[0-4,7],Blade[0-7],Node[0-16]"
alias "tblade0" "blade0"
alias "tblade1" "blade0"
alias "tblade2" "blade1"
alias "tblade3" "blade1"
...
with the exception that blade0
is listed twice in the hostrange output (easily solved with hostlist_unq()
call), this actually works.
>src/powerman/powerman -h localhost:11099 -q tblade0,tblade1 -T
send(d0): 'stat Blade0\n'
recv(d0): 'Blade0: off\n'
recv(d0): 'redfishpower> '
on:
off: blade[0,0]
unknown:
>src/powerman/powerman -h localhost:11099 -1 "tblade[0-3]" -T
send(d0): 'on Blade[0-1]\n'
recv(d0): 'Blade0: ok\n'
recv(d0): 'Blade1: ok\n'
recv(d0): 'redfishpower> '
Command completed successfully
I was initially a little surprised it works.
arglist
it hashes plugs. So for example, when iterating the inputted hostsArg *arglist_next(ArgListIterator itr)
{
Arg *arg = NULL;
char *node;
node = hostlist_next(itr->itr);
if (node != NULL) {
arg = hash_find(itr->arglist->args, node);
free(node); /* hostlist_next strdups returned string */
}
return arg;
}
each blade0
will return the same result b/c the hash_find()
only has one response entry.
The devil is in the details of course. I'm not sure what fallout there could be by calling hostlist_uniq()
before return the response and other shenanigans.
But at a very high level ... I think this just works.
@watson6282 shall I invest more effort in this? Last we chatted it was an idea. Not sure if you really want to pursue.
I did a test implementation of this last night to mess with, and it works and seems to solve the problem. Is it feasible to implement host ranges in aliases? E.g.:
alias "pfoo10[01,03,05,07,09,11,13,15,17,19]" "foo-blade[1,2,3,4,5,6,7,8,9,10]"
alias "pfoo10[02,04,06,08,10,12,14,16,18,20]" "foo-blade[1,2,3,4,5,6,7,8,9,10]"
Haven't looked at the code but it presumably should be doable, and I suppose it's a necessity as listing a bajillion alias lines in the powerman.conf is probably a non-starter.
Before I look into that @garlick, do you have a opinion on the calling of hostlist_uniq()
on the output response? I believe that is important for this support as well. It is a behavior change. I can't imagine anyone having a situation in which they do not want the output to be unique, but there could be some odd-ball cases I can't think about, like some odd-ball power device.
Except the main use case for aliases is to support mapping one name to multiple plugs. So with a list on the LHS it is ambiguous whether it should expand to each name on the LHS mapping to all the names on the RHS or just one.
do you think a new config command would be preferred? off the top of my head map
?
map "pfoo10[01,03,05,07,09,11,13,15,17,19]" "foo-blade[1,2,3,4,5,6,7,8,9,10]"
map "pfoo10[02,04,06,08,10,12,14,16,18,20]" "foo-blade[1,2,3,4,5,6,7,8,9,10]"
My thinking was basically "alias with one host as the first argument=one to many, alias with multiple hosts as first argument=one to one." Of course there are weird cases like what do you do if I put in alias "foo[1-4]" "bar[1-3]"
?
It's not the end of the world if we just end up with num_compute_node lines in the powerman.conf assuming that won't make it barf.
A new command would also totally make sense as Al just posted. Certainly avoids ambiguity and directly implicates the one-to-one mapping. In my perfect world I'd love support for things like "all the odd numbers in this hostlist get aliased to this list of nodes" to avoid the big comma-separated lists above, but that's kind of an extreme niche.
These configs are often script generated on big clusters anyway, so my inclination is that the shorthand command isn't really necessary.
Partly it's because we need Al working on high priority Flux work. So perhaps we could open a separate issue and think about whether we want I longer term?
User asked for an interesting support. In the el cap environment, two compute nodes are attached to one blade. It can be inconvenient to map a compute node to its blade (eg node1478 maps to blade???)
Idea was map two fictitious node names to same blade, ie blade1477 and blade1478 both power control the same blade.
The primary issue is how to deal with user submitting both nodes at same time to power control (i.e. `pm -0 blade1477,blade1478).
Pondering a bit, I think this may be doable if
arglist_find()
could return multiple args (i.e. a single power result could update multiple args). The ranged scripts could handle by callinghostlist_uniq()
before calling the power control target.But I bet there’s stuff I’m not thinking about right now.
Edit: minimally, what errors could occur on non-ranged scripts. But I suppose we aren't checking that multiple "targets" don't already point to the same plug. So perhaps it doesn't matter?