It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
272 stars 21 forks source link

Error: specifying `--resource "cpus=sum(5)"` with `--no-hyper-threading` #616

Closed mbercx closed 12 months ago

mbercx commented 12 months ago

I'm running into an error when combining --no-hyper-threading with --cpus. The following works fine:

hq worker start --cpus='sum(5)'

But when I try:

hq worker start --no-hyper-threading --resource "cpus=sum(5)"

I get:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: EmptyGroups', crates/hyperqueue/src/worker/hwdetect.rs:92:49
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

This is running HyperQueue v0.16.0. Is there a reason why I can't specify a sum pool resource for cpus when disabling hyperthreading?

Kobzol commented 12 months ago

This is a bug, it definitely shouldn't crash! Although, it's interesting to think about whether the combination of sum and no hyper threading makes sense. A "sum pool" explicitly states that you don't really care about the identity of the resource values, just about their count. However, --no-hyper-threading definitely cares about the identity, because it needs to disable specific cores! This shows that the hyperthreading flag in HQ is a leaky abstraction, because it really is special for CPU cores, but the sum pool is generic for all resources.

I haven't really ever used sum(5) for CPUs, and I don't think that it makes a lot of sense. I think that we should just error out in HQ for this usage. CC @spirali

The fix for your situation could be to use --cpus=5. This will basically use cores with IDs 0, 1, 2, 3, 4, and then prune hyper-threading from them. But this is probably not what you want, as I guess that you want something like "give me 5 cores, but don't include HT cores". I think that this is currently not possible to express with HQ, because you can either use auto detection of CPUs OR select their count, but not both.

mbercx commented 12 months ago

Great, thanks for the quick response @Kobzol! To be honest, I was mainly playing around with the various resources specifications while reading that part of the documentation, and disable hyperthreading by default since it's detrimental to the performance of the code I typically run (Quantum ESPRESSO).

I think using --cpus=5 is perfectly fine for my purposes, and can see how using sum(5) doesn't really make sense. ^^

spirali commented 12 months ago

Yes, it should be an error. Maybe we should disallow to use sum for cpus resource at all as it usually does not make any sense.