Closed wdbaruni closed 4 weeks ago
[!IMPORTANT]
Review skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yaml
file in this repository. To trigger a single review, invoke the@coderabbitai review
command.You can disable this status message by setting the
reviews.review_status
tofalse
in the CodeRabbit configuration file.
This change is related to https://github.com/bacalhau-project/bacalhau/pull/4049 where instead of queueing locally in each compute node, we try to queue in the requester instead so that jobs are scheduled to new nodes that join, or to the first node that frees up its resources.
The current state is we don't filter out nodes if they don't have immediate available capacity or if their queue is growing large. We rank nodes with more capacity higher, but we don't filter out nodes with no capacity. This change allows operators to define
NodeOverSubscriptionFactor
in the requester node to allow it to filter out any compute node with total active and queue capacity beyond the factor. The default is1.5
which means the compute node can queue locally half of its total capacity in addition to the running capacity.Testing
This change has been tested with https://github.com/bacalhau-project/bacalhau/pull/4049 in dev stack as documented in that issue