flexcompute / tidy3d

Fast electromagnetic solver (FDTD) at scale.
https://www.flexcompute.com/tidy3d/solver/
GNU Lesser General Public License v2.1
164 stars 40 forks source link

Asynchronous web tasks #1572

Open daquinteroflex opened 3 months ago

daquinteroflex commented 3 months ago

When doing parameter scans, especially with large simulation tasks like those that use CustomMediums, it takes a pretty long time to create/upload the task, and then sequentially download it. Could we paralellize this process?

Note related to https://github.com/flexcompute/tidy3d/issues/1242 but specifically on asynchronous tasks.

This is however a bit broader:

For example, in https://docs.flexcompute.com/projects/tidy3d/en/latest/notebooks/MetalHeaterPhaseShifter.html

for _, hs_data in batch_data.items():
    temp_interpolated = hs_data["temperature"].temperature.interp(x=target_grid.x, y=0, z=target_grid.z, fill_value=300)
    psim = optic_sim.perturbed_mediums_copy(temperature=temp_interpolated)
    perturb_sims.append(psim)

It'd be nice if hs_data could be just a collection of task result references, that can be downloaded, and not have to be uploaded to the cloud again.

Useful reading:

daquinteroflex commented 3 months ago

Theory behind the implementation options

We want users to access asynchronous commands without much complexity. In my opinion, we want:

From theiron.io blog: image

Candidate implementation packages suggested:

Important concepts to understand:

CPU-bound jobs will spend most of their execution time on actual computation ("number crunching"[1]) as opposed to e.g. communicating with and waiting for peripherals such as network or storage devices (which would make them I/O bound instead).

In our case, our blocking functions are pretty clear: the user has to await our server to receive the uploaded simulation, run the pipeline, and download the simulation. In this sense, fundamentally our web api has to wait for such operations to be completed, and hence our operations are mainly IO-limited really.

Now, let's evaluate each package according to this requirement:

"asyncio is often a perfect fit for IO-bound and high-level structured network code"

I have also looked into multiprocess. The documentation and version management is not great in my opinion https://multiprocess.readthedocs.io/en/latest/multiprocess.html

Requirements

One of the main things to define is what we want to parallelise and what we don't.

My personal requirements based on my understanding are for the 3.0 architecture:

Implementation caveats