Closed grondo closed 1 month ago
This fixes it for me!
Setting MWP.
Attention: Patch coverage is 71.42857%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 75.3%. Comparing base (
4961e7b
) to head (e6dad53
). Report is 4 commits behind head on master.
Files with missing lines | Patch % | Lines |
---|---|---|
resource/readers/resource_reader_rv1exec.cpp | 71.4% | 2 Missing :warning: |
This PR fixes resource cancel with rv1exec when there is more than one entry in the Rv1
R_lite
array.Currently for every resource free request (whether a "partial" cancel or not) where not all ranks have the same set of cores or gpus allocated, users will see these nuisance error message in their logs:
Since my home cluster has a differing number of cores between nodes, I get these errors for almost every job.
The issue is that while
resource_reader_rv1exec_t::partial_cancel_internal()
loops over the entries inR_lite
, it doesn't accumulate the visited ranks idsets, instead overwriting theranks
string each time and only processing the final one.Also, there clearly wasn't a test for this case, so I added a new sharness test that contains a reproducer, along with some sanity testing with housekeeping that could later be expanded.