davidsd / sdpb

A semidefinite program solver for the conformal bootstrap.
MIT License
55 stars 43 forks source link

Combine pmp2sdp and sdpb into a single executable? #78

Open vasdommes opened 1 year ago

vasdommes commented 1 year ago

Problem

Common use case:

These intermediate IO operations can be quite expensive (tens of minutes). In Skydive, sdp2input+sdpb are called for each iteration, and IO takes up to ~50% of total sdpb time (as noted by @suning-git).

Solution

Create a single executable that accepts input in different formats and performs in-memory conversion to the format accepted by solver.

Potential issues

See comments below for details.

vasdommes commented 1 year ago

SDPB restart

sdp.zip is reused for SDPB restart, so it makes sense to write it anyway.

We can run SDPB with the following options:

sdpb --pmpPath=pmp.json --sdpPath=sdp.zip

Behavior:

NB: This can be problematic in the following scenario:

sdpb --pmpPath=old_pmp.json --sdpPath=sdp.zip
sdpb --pmpPath=new_pmp.json --sdpPath=sdp.zip # User assumes that sdp.zip will be overwritten, but it isn't!

Possible solutions:

Thus, if we talk only about usability (and not about IO performance), maybe it's still better to keep two separate executables.

vasdommes commented 1 year ago

Distributing PMP matrices and SDP blocks

Speaking of performance, there is a problem with distributing the blocks among the cores.

Current behavior:

In sdp2input, each core stores and processes only some polynomial matrices, according to a simple rule matrix_index % num_procs == rank https://github.com/davidsd/sdpb/blob/0ba5ecbcddcc8cdb1e2a57cac518163fe6362fa6/src/sdp_read/read_input/read_mathematica/parse_SDP/parse_matrices.cxx#L29

In SDPB, we distribute blocks among cores according to block costs (which are read from timing data or estimated by block sizes) https://github.com/davidsd/sdpb/blob/3019fcd7122794ddb9618de1adcd1d8439716031/src/sdp_solve/Block_Info/read_block_costs.cxx

Moreover, a single block can be stored as a DistMatrix for a group of cores, if procGranularity>1.

The problem

If we want to keep everything in-memory (without writing and reading sdp.zip), how do we switch from initial (PMP) block distribution to the final one?

If timing data is available, we can use it from the very beginning. Potential problem: procGranularity. Probably we'll have to read the same matrix for each core in group, convert it (again for each core) and then store to DistMatrix. Or read it only with the first core, and then send to other cores. Another problem: if the order of PMP files changes, then block indexing changes,

If there is no timing data, then we can read PMP matrices as we do now, and then perform some non-trivial MPI messaging to redistribute them. Maybe it will not be significantly faster than just writing to disk and reading again. Probably we can look at PMP matrix sizes and calculate the corresponding block costs at the start?

Anyway, all this requires non-trivial code changes, and we should do it only if IO for sdp.zip is a real bottleneck. (e.g. we probably don't want to fix this if generating and writing PMP in Mathematica is much slower)

vasdommes commented 9 months ago

After recent pmp2sdp input #150 and output #177 optimizations, IO performance should not be that much of a problem.