aurae-runtime / aurae

Distributed systems runtime daemon written in Rust.
https://aurae.io
Apache License 2.0
1.84k stars 91 forks source link

example scripts and aer failing at HEAD #360

Closed dmah42 closed 1 year ago

dmah42 commented 1 year ago
ERROR auraed::cells::cell_service::error: cell 'ae-sleeper-cell' allocation was aborted:
cgroup 'ae-sleeper-cell' creation failed: failed to apply cpu resource restrictions
dmah42 commented 1 year ago

the error is happening before auraed even sees a call to allocate (for example) so it's happening early.

dmah42 commented 1 year ago

reproducible with ./target/debug/aer cell allocate test-cell --cell-cpu-weight=2

dmah42 commented 1 year ago

message: "cell 'test-cell' allocation was aborted: cgroup 'test-cell' creation failed: failed to open \"/sys/fs/cgroup/test-cell/cpu.weight\""

looks like libcgroups is trying to write cpu.weight before creating the test-cell directory.

dmah42 commented 1 year ago

ok.. libcgroups creates the cgroups directory when a task is added. only then can the controls be applied. unfortunately, this is not how we expect to do things: we create the cgroups directory and controls, then add processes later.

i think for now we will revert the move to libcgroups.

dmah42 commented 1 year ago

reverting libcgroups port in https://github.com/aurae-runtime/aurae/pull/362

dmah42 commented 1 year ago

now it's reverted, i'm going to close this. however if we want to move to libcgroups we'll need some more effort.