vasdommes commented 1 year ago

Problem

Common use case:

User runs sdp2input, it writes sdp to disk
User run sdpb, it reads sdp from disk and runs solver
Intermediate sdp is never used elsewhere

These intermediate IO operations can be quite expensive (tens of minutes). In Skydive, sdp2input+sdpb are called for each iteration, and IO takes up to ~50% of total sdpb time (as noted by @suning-git).

Solution

Create a single executable that accepts input in different formats and performs in-memory conversion to the format accepted by solver.

Potential issues

See comments below for details.

[ ] sdp.zip is needed for SDPB restart. SDPB has to decide whether to overwrite sdp.zip or ignore PMP input, if both are present. May lead to subtle bugs if the user doesn't really understand SDPB behavior.
[ ] Redistributing PMP matrices and blocks among the cores may be non-trivial, with lots of MPI messaging. Do we get any significant speedup compared to our current simple strategy "write sdp, the read sdp"?
[ ] Possible to get too many open files error (each core writes something) - need to limit number number of open files during SDP write? By the way, we can get that same problem e.g. in SDPB in debug mode when writing profiling data.

vasdommes commented 1 year ago

SDPB restart

sdp.zip is reused for SDPB restart, so it makes sense to write it anyway.

We can run SDPB with the following options:

sdpb --pmpPath=pmp.json --sdpPath=sdp.zip

Behavior:

If sdp.zip exists, then SDPB will read it and start the solver (ignoring PMP).
If sdp.zip doesn't exists, SDPB will read pmp.json, convert it to sdp, write the result to sdp.zip and then start the solver.

NB: This can be problematic in the following scenario:

sdpb --pmpPath=old_pmp.json --sdpPath=sdp.zip
sdpb --pmpPath=new_pmp.json --sdpPath=sdp.zip # User assumes that sdp.zip will be overwritten, but it isn't!

Possible solutions:

Explicit flag --overwriteSdp=true. It's bad because generally we want to restart SDPB with the same parameters, so it will convert PMP again on restart.
Somehow check that the PMP input is the same, i.e. calculate checksum on all input files (and store in sdp.zip/control.json?). It works, but adds extra complexity.

Thus, if we talk only about usability (and not about IO performance), maybe it's still better to keep two separate executables.

vasdommes commented 1 year ago

Distributing PMP matrices and SDP blocks

Speaking of performance, there is a problem with distributing the blocks among the cores.

Current behavior:

In sdp2input, each core stores and processes only some polynomial matrices, according to a simple rule matrix_index % num_procs == rank https://github.com/davidsd/sdpb/blob/0ba5ecbcddcc8cdb1e2a57cac518163fe6362fa6/src/sdp_read/read_input/read_mathematica/parse_SDP/parse_matrices.cxx#L29

In SDPB, we distribute blocks among cores according to block costs (which are read from timing data or estimated by block sizes) https://github.com/davidsd/sdpb/blob/3019fcd7122794ddb9618de1adcd1d8439716031/src/sdp_solve/Block_Info/read_block_costs.cxx

Moreover, a single block can be stored as a DistMatrix for a group of cores, if procGranularity>1.

The problem

If we want to keep everything in-memory (without writing and reading sdp.zip), how do we switch from initial (PMP) block distribution to the final one?

If timing data is available, we can use it from the very beginning. Potential problem: procGranularity. Probably we'll have to read the same matrix for each core in group, convert it (again for each core) and then store to DistMatrix. Or read it only with the first core, and then send to other cores. Another problem: if the order of PMP files changes, then block indexing changes,

If there is no timing data, then we can read PMP matrices as we do now, and then perform some non-trivial MPI messaging to redistribute them. Maybe it will not be significantly faster than just writing to disk and reading again. Probably we can look at PMP matrix sizes and calculate the corresponding block costs at the start?

Anyway, all this requires non-trivial code changes, and we should do it only if IO for sdp.zip is a real bottleneck. (e.g. we probably don't want to fix this if generating and writing PMP in Mathematica is much slower)

vasdommes commented 9 months ago

After recent pmp2sdp input #150 and output #177 optimizations, IO performance should not be that much of a problem.

davidsd / sdpb

Combine pmp2sdp and sdpb into a single executable? #78

Problem

Solution

Potential issues

SDPB restart

Distributing PMP matrices and SDP blocks

Current behavior:

The problem