Make output JSON files compatible with vt's OfflineLB

nlslatt commented 1 year ago

The JSON files output by LBAF will be consumed by vt's LBDataRestartReader and replayed (in terms of object locations) by vt's OfflineLB. There are certain conditions that need to be met by LBAF's output files in order for the files to be acted upon as expected. I'll start with two common use cases before explaining more generally.

An input file containing only phase 0: LBAF should write the input phase 0 data verbatim into the output file, still labeled as phase 0. The phase 1 data written to the output file should contain loads from phase 0 but use the post-LB object locations.

An input file containing phases 0 and 1, where LBAF only load balances on phase 0: LBAF should write the input phase 0 data verbatim into the output file, still labeled as phase 0. The phase 1 data written to the output file should contain loads from the phase 1 inputs but use the post-LB object locations.

For cases where there are more phases of data in the input file, LBAF balances more than one phase, or LBAF starts load balancing not on phase 0:

All phases before and including the first one LBAF attempts to load balance should be copied verbatim from the input file into the output file.
The load balancer input for any given phase J should be the object locations and loads that were already written to the output file for phase J (if the load balancer has already run on any phase, this will likely not match the object locations from the input file).
The output of the load balancer that was run on phase J will be written to the output file as phase J+1. If phase J+1 existed in the input file, use the loads specified (but not object locations) in the input file for writing phase J+1; otherwise, re-use the loads from the phase J input.
If the load balancer was not run on some phase K, the output for phase K+1 will have the same object locations as phase K but use the loads from the phase K+1 input.

Does that all sound correct to you @lifflander ?

lifflander commented 1 year ago

Yes, that sounds exactly correct to me

ppebay commented 1 year ago

@nlslatt now that the writer is doing what it's supposed to do per phase, I am getting to the execution logic you described.

However, regarding this requirement:

For cases where there are more phases of data in the input file, LBAF balances more than one phase, or LBAF starts load balancing not on phase 0:

All phases before and including the first one LBAF attempts to load balance should be copied verbatim from the input file into the output file.

I have the following question: the way LBAF is written is that it loads only phases whose IDs are contained in the list (possibly reduced to a singleton) specified by phase_ids in the configuration file. This is for efficiency's sake, especially in large JSON files where only a single phase is to be load-balanced (or a few phases are to be stepped-through -- and not necessarily consecutive ones). However, in order to achieve the desired specification above, all phases would have to be read, loaded into LBAF's internal (so they might be outputted later) before phase J, which would be very inefficient in general.

I therefore submit that this modus operandi should be optional, and probably turned off by default. Do you agree?

nlslatt commented 1 year ago

I therefore submit that this modus operandi should be optional, and probably turned off by default. Do you agree?

@ppebay That sounds reasonable.

ppebay commented 1 year ago

More on this @lifflander, @nlslatt

Currently, the JSON schema validator considers this rank entry as valid:

{"type":"LBDatafile","phases":[]}

(this is the case of rank 3 in the synthetic_lb_data example, which indeed has no objects assigned to it).

This is causing problems to support the stated goals of #367, because the empty phase is inconsistent in that it does not have an ID. I would argue that this is inconsistent anyway with the other ranks, e.g. 2:

{"type":"LBDatafile","phases":[{"id":0,"tasks":[{"entity":{"id":8,"home":2,"type":"object","migratable":true},"node":2,"resource":"cpu","time":1.5}],"communications":[{"type":"SendRecv","to":{"type":"object","id":6},"messages":1,"from":{"type":"object","id":8},"bytes":1.5}]}]}

because this gives the impression that rank 3 is in phase-less state.

My proposal would be to modify our schema, in order that in such a case (no tasks assigned to a rank), the JSON file for rank 3 instead contain the following:

{"type":"LBDatafile","phases":[{"id":0,"tasks":[]}]}

If my proposal is accepted, that would probably mean modifying the vt JSON writer, and assuredly the schema validator.

What do you think?

nlslatt commented 1 year ago

@ppebay @lifflander I see the need for a change here. My question is why {"type":"LBDatafile","phases":[{"id":0,"tasks":[]}]} as opposed to {"type":"LBDatafile","phases":[{"id":0}]} or {"type":"LBDatafile","phases":[{"id":0,"tasks":[],"communications":[]}]}?

ppebay commented 1 year ago

@lifflander @nlslatt

Good question, I don't have any strong opinion regarding the answer to it :)

As long as the phase id is in there, I am fine with it.

Maybe the second one is best thanks to its compactness?

DARMA-tasking / LB-analysis-framework

Make output JSON files compatible with vt's OfflineLB #367