Closed nlslatt closed 1 year ago
Yes, that sounds exactly correct to me
@nlslatt now that the writer is doing what it's supposed to do per phase, I am getting to the execution logic you described.
However, regarding this requirement:
For cases where there are more phases of data in the input file, LBAF balances more than one phase, or LBAF starts load balancing not on phase 0:
All phases before and including the first one LBAF attempts to load balance should be copied verbatim from the input file into the output file.
I have the following question: the way LBAF is written is that it loads only phases whose IDs are contained in the list (possibly reduced to a singleton) specified by phase_ids
in the configuration file. This is for efficiency's sake, especially in large JSON files where only a single phase is to be load-balanced (or a few phases are to be stepped-through -- and not necessarily consecutive ones). However, in order to achieve the desired specification above, all phases would have to be read, loaded into LBAF's internal (so they might be outputted later) before phase J
, which would be very inefficient in general.
I therefore submit that this modus operandi should be optional, and probably turned off by default. Do you agree?
I therefore submit that this modus operandi should be optional, and probably turned off by default. Do you agree?
@ppebay That sounds reasonable.
More on this @lifflander, @nlslatt
Currently, the JSON schema validator considers this rank entry as valid:
{"type":"LBDatafile","phases":[]}
(this is the case of rank 3 in the synthetic_lb_data
example, which indeed has no objects assigned to it).
This is causing problems to support the stated goals of #367, because the empty phase is inconsistent in that it does not have an ID. I would argue that this is inconsistent anyway with the other ranks, e.g. 2:
{"type":"LBDatafile","phases":[{"id":0,"tasks":[{"entity":{"id":8,"home":2,"type":"object","migratable":true},"node":2,"resource":"cpu","time":1.5}],"communications":[{"type":"SendRecv","to":{"type":"object","id":6},"messages":1,"from":{"type":"object","id":8},"bytes":1.5}]}]}
because this gives the impression that rank 3 is in phase-less state.
My proposal would be to modify our schema, in order that in such a case (no tasks assigned to a rank), the JSON file for rank 3 instead contain the following:
{"type":"LBDatafile","phases":[{"id":0,"tasks":[]}]}
If my proposal is accepted, that would probably mean modifying the vt JSON writer, and assuredly the schema validator.
What do you think?
@ppebay @lifflander I see the need for a change here. My question is why
{"type":"LBDatafile","phases":[{"id":0,"tasks":[]}]}
as opposed to
{"type":"LBDatafile","phases":[{"id":0}]}
or
{"type":"LBDatafile","phases":[{"id":0,"tasks":[],"communications":[]}]}
?
@lifflander @nlslatt
Good question, I don't have any strong opinion regarding the answer to it :)
As long as the phase id
is in there, I am fine with it.
Maybe the second one is best thanks to its compactness?
The JSON files output by LBAF will be consumed by vt's
LBDataRestartReader
and replayed (in terms of object locations) by vt'sOfflineLB
. There are certain conditions that need to be met by LBAF's output files in order for the files to be acted upon as expected. I'll start with two common use cases before explaining more generally.An input file containing only phase 0: LBAF should write the input phase 0 data verbatim into the output file, still labeled as phase 0. The phase 1 data written to the output file should contain loads from phase 0 but use the post-LB object locations.
An input file containing phases 0 and 1, where LBAF only load balances on phase 0: LBAF should write the input phase 0 data verbatim into the output file, still labeled as phase 0. The phase 1 data written to the output file should contain loads from the phase 1 inputs but use the post-LB object locations.
For cases where there are more phases of data in the input file, LBAF balances more than one phase, or LBAF starts load balancing not on phase 0:
J
should be the object locations and loads that were already written to the output file for phaseJ
(if the load balancer has already run on any phase, this will likely not match the object locations from the input file).J
will be written to the output file as phaseJ+1
. If phaseJ+1
existed in the input file, use the loads specified (but not object locations) in the input file for writing phaseJ+1
; otherwise, re-use the loads from the phaseJ
input.K
, the output for phaseK+1
will have the same object locations as phaseK
but use the loads from the phaseK+1
input.Does that all sound correct to you @lifflander ?