It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
266 stars 20 forks source link

Compilation on mac x86_64 #688

Closed ValentinHirschi closed 3 months ago

ValentinHirschi commented 3 months ago

Trying to compile hyperqueue main branch (rev 7d52f2af202ab59adfd47a6fc352d94033a9fc25) on darwin x86_64 fails:

RUSTFLAGS="-C target-cpu=native" cargo build --release ->

error[E0425]: cannot find function `prctl` in crate `libc`
   --> crates/tako/src/launcher.rs:181:29
    |
181 |             let ret = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM);
    |                             ^^^^^ not found in `libc`

error[E0425]: cannot find value `PR_SET_PDEATHSIG` in crate `libc`
   --> crates/tako/src/launcher.rs:181:41
    |
181 |             let ret = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM);
    |                                         ^^^^^^^^^^^^^^^^ not found in `libc`

which may be due to the libc re-export of nix?

I can confirm that compilation is successful by just commenting the following in crates/tako/src/launcher.rs:

            // Send SIGTERM to this task when the parent (worker) dies.
            /*
            let ret = libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM);
            match ret {
                0 => {}
                error => log::error!("Cannot set PR_SET_PDEATHSIG for task process: {error:?}"),
            }
            */

Side question: is this also the reason why the pypi wheel (https://pypi.org/project/hyperqueue/0.18.0/#files) is set to not be compatible on my mac architecture? I.e. python3.11 -m pip install hyperqueue gives No matching distribution found for hyperqueue. And manually trying to install that wheel with: python3.11 -m pip install ~/Downloads/hyperqueue-0.18.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl confirms: ERROR: hyperqueue-0.18.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl is not a supported wheel on this platform.

Sorry if I missed an obvious limitation of hyperqueue support mentioned somewhere else.

Kobzol commented 3 months ago

Hi, HyperQueue is not really supported on other OSes than Linux at the moment. While we could fix the compilation of HQ itself relatively easily, there is a lot of assumptions and features that rely on Linux, and if we ignored these and used HQ on e.g. Mac, it could lead to runtime errors (in the best case) or weird or missing behaviour (in the worst case).

Indeed that is also the reason why we don't build the HQ whweel for Mac.

That being said, for simple use-cases, HQ would probably work on Mac just fine though. We could perhaps enable some compatibility mode for other OSes, for which we don't guarantee any features and don't promise to fix any bugs. @spirali what do you think?

ValentinHirschi commented 3 months ago

@Kobzol Thank you for the quick answer. Just to be clear, the use-case for HQ on Mac is not for production deployment. Instead, it is quite often convenient to debug an HQ implementation of the parallelisation in a given project by trying it locally in a multi-core setup (since the HQ abstraction essentially brings both HPC and multicore paralellisation on a similar footing).

In my case this would mean testing our deployment locally on my macbook pro. I of course understand that this would only be meant to test my own upstream handling of the data collected from the jobs, and that any aspect specific to HQ handling of jobs cannot be expected to behave the same on a multicore run on mac than it does on an HPC cluster. For this reason, the "minimal compatibility" mode you suggest would be enough and welcome for the purpose I described.

Kobzol commented 3 months ago

Instead, it is quite often convenient to debug an HQ implementation of the parallelisation in a given project by trying it locally in a multi-core setup (since the HQ abstraction essentially brings both HPC and multicore paralellisation on a similar footing).

I agree, and I think that this is one of very important and useful properties of HyperQueue. I haven't realized that this is actually a pretty good motivation for the compatibility mode.

I created an issue for the compatibility mode.

spirali commented 3 months ago

My biggest concern is about reliable task canceling that is tricky and very OS specific. I expect that users will expect that operations like "hq job cancel all" works even in dev environment.

Kobzol commented 3 months ago

There will be edge cases, but again, we would explicitly state that it is a best effort. I think that it is better to provide a mostly working version for local experiments, than to force users to install virtual/containerized Linux just to run HQ locally.

And even on Linux, there are now definitely situations where HQ doesn't kill all subprocesses.

Kobzol commented 3 months ago

https://github.com/It4innovations/hyperqueue/pull/693 should enable compilation on macOS.

ValentinHirschi commented 3 months ago

@Kobzol Excellent, I can confirm I can compile this branch fine.

I wanted to try and build the python API as well with maturin, but I could not find details on hyperqueue documentation about how I am supposed to invoke maturin for this project.

Kobzol commented 3 months ago

You can check the CI workflow for an example.

ValentinHirschi commented 3 months ago

I can confirm that the maturin build was successful and I could successfully run the example code from the python API section of the documentation. This is great, thanks. I will let you know in different future git issues if I come across functionalities that seem broken on Mac and that one can reasonably expect to be working for the purpose of local testing.