chaos / powerman

cluster power control
GNU General Public License v2.0
43 stars 19 forks source link

pm -q fails on large input queries #193

Closed chu11 closed 4 months ago

chu11 commented 4 months ago

this works

> pm -q $(flux hostlist -L 1635 "$(cluset -f 'myclu[1001-5000/2]')")

but these don't

> pm -q $(flux hostlist -L 1636 "$(cluset -f 'myclu[1001-5000/2]')")
Command too long
> pm -q $(flux hostlist -L 1638 "$(cluset -f 'myclu[1001-5000/2]')")
pm: hostlist error

(FYI the bash command above leads to

myclu[1001,1003,1005,1007,1009,1011,...,4275]

for the -L 1638 case)

The "Command too long" comes from exceeding CP_LINEMAX on the client side.

The hostlist error comes from:

    char argument[CP_LINEMAX];
    if (hostlist_ranged_string(targets, sizeof(argument), argument) == -1)
        err_exit(false, "hostlist error");

which is also using a CP_LINEMAX buffer.

I would like to think just doubling or tripling the CP_LINEMAX will be fine, although ...

/* NOTE: sscanf is safe because 'str' is guaranteed to be < CP_LINEMAX */

although that comment is from 2002 and I would like to think longer strings are ok now (a cursory googling did not give me an immediate answer).

garlick commented 4 months ago

I don't know if it will fix the hostlist error but we might want to consider pulling in the hostlist implementation from flux since it's actually had some attention paid to it in the last decade.

I don't think the NOTE: comment is about scanf itself. it's about the source and destination buffers. It's say the source is max CP_LINEMAX so it's OK to sscanf %s into another CP_LINEMAX buffer. So I think increasing CP_LINEMAX should be fine (and would have been fine in 2002 also).

chu11 commented 4 months ago

doubling the CP_LINEMAX to 16K seemed to solve the above problems.

So for the worst case input (every other node in a cluster hostrange) the limit is about a 3200 node cluster with a 8K CP_LINEMAX buffer (b/c the bug was hit around node 4275. note the example above started at node 1000 to make the host number length consistent throughout, so it's really a bit more than 3200 nodes). 3200 nodes is certainly smaller than the biggest HPC clusters out there nowadays according to Google.

Soooo

16K buffer - worst case of 6400 node cluster 32K buffer - ~12000 node cluster 64K buffer - ~24000 node cluster 128K buffer - ~48000 node cluster

I'd say lets up it to 128K to future proof this for quite some time.

Edit: and obviously divide by 2 if someone is doing dumb things with hostlist ... i.e. node[0,1,2,3,4,5 .... 7998, 7999, 8000]