Closed nlslatt closed 1 year ago
When I try to load balance a dataset with 40 ranks (with files named
data.$i.json
wherei
runs from0
through39
with no additional zero padding), it only outputs 20 json files. The logger output shows that only the first 20 input files were loaded.When I try to load balance a dataset with 128 ranks, it only loads the first 64 files and then outputs 64 files.
With an older version of LBAF that accepts
n_ranks
in the yaml file, all files are correctly loaded and output, so the naming convention should not be an issue.These datasets used 2 ranks per compute node. I do not know if LBAF could be reading the json header and using the number of compute nodes used instead of the number of files present.
I am not able to reproduce this incorrect behavior on the
user-defined-memory-toy-problem
.
This might be related to n_ranks
auto-detection as explained here:
https://github.com/DARMA-tasking/LB-analysis-framework/issues/353#issuecomment-1570176987
Is there some json_data[metadata][shared_node][num_nodes]
value in the data files? Is the information saying the number of ranks per compute node in another key in the data file? Should we need to multiply num_nodes by this other node value ? If it is the case could you please provide sole data file ?
We need to NOT read json_data[metadata][shared_node][num_nodes]. Just read the file names to get n_ranks.
When I try to load balance a dataset with 40 ranks (with files named
data.$i.json
wherei
runs from0
through39
with no additional zero padding), it only outputs 20 json files. The logger output shows that only the first 20 input files were loaded.When I try to load balance a dataset with 128 ranks, it only loads the first 64 files and then outputs 64 files.
With an older version of LBAF that accepts
n_ranks
in the yaml file, all files are correctly loaded and output, so the naming convention should not be an issue.These datasets used 2 ranks per compute node. I do not know if LBAF could be reading the json header and using the number of compute nodes used instead of the number of files present.
I am not able to reproduce this incorrect behavior on the
user-defined-memory-toy-problem
.