DiamondLightSource / blueapi

Apache License 2.0
2 stars 5 forks source link

Consider moving the worker around (run engine) into a separate microservice #504

Open stan-dot opened 2 weeks ago

stan-dot commented 2 weeks ago

We should strive to avoid bloat like with GDA and possibly those microservices could be deployed separately.

graph TD
    CLI[CLI Tool] -->|Submits jobs| A[FastAPI Frontend]
    CLI -->|Previews IoT devices| IoT[IoT Device Representation in ophyd-async]
    CLI -->|Listens to| D[Message Bus]
    A -->|Reads jobs| B[Job Queue]
    B --> C[Worker Service]
    C -->|Emits results| D
    D --> E[Result Processing Service]
    C -->|Protobuf| F[Other Microservices]
    style CLI fill:#fff,stroke:#000,stroke-width:2px
    style A fill:#fff,stroke:#000,stroke-width:2px
    style B fill:#fff,stroke:#000,stroke-width:2px
    style C fill:#fff,stroke:#000,stroke-width:2px
    style D fill:#fff,stroke:#000,stroke-width:2px
    style E fill:#fff,stroke:#000,stroke-width:2px
    style F fill:#fff,stroke:#000,stroke-width:2px
    style IoT fill:#fff,stroke:#000,stroke-width:2px
callumforrester commented 2 weeks ago

The subprocess already does some of this. We could separate the management and worker processes out completely, I'm not opposed to that. See also #371

What is the IoT device service for?

stan-dot commented 2 weeks ago

Yeah that inaccurate but basically ophyd async

callumforrester commented 1 week ago

Are the ophyd-async devices instantiated twice? Once in the worker service and once in the IoT service? There's no arrow between them.

stan-dot commented 1 week ago

I was not precise, now I changed it from 'service' to 'representation' to be more abstract. in essence this would entail splitting of the context:


@dataclass
class BlueskyContext:
    """
    Context for building a Bluesky application.

    The context holds the RunEngine and any plans/devices that you may want to use.
    """

    run_engine: RunEngine = field(
        default_factory=lambda: RunEngine(context_managers=[])
    )
    plans: dict[str, Plan] = field(default_factory=dict)
    devices: dict[str, Device] = field(default_factory=dict)
    plan_functions: dict[str, PlanGenerator] = field(default_factory=dict)

    _reference_cache: dict[type, type] = field(default_factory=dict)

we would keep the devices and plans (as everything available on the beamline) and only send 1 plan with the devices needed there to the worker service. The worker service would keep the RunEngine.

callumforrester commented 1 week ago

I think I'm going to need even more precision, what out of the following objects lives in which process?

stan-dot commented 1 week ago

worker service

REST gateway

the metadata part is less familiar to me, not sure.

callumforrester commented 1 week ago

So devices are instantiated lazily (when a plan that needs them is run)? And then destroyed when the plan finishes?

stan-dot commented 1 week ago

not sure, it's a parameter independent of whether RE is separate or not. I haven't got the picture in that details

callumforrester commented 1 week ago

@stan-dot How about this?

graph TD
    CLI[Client] -->|Submits jobs| A[FastAPI Frontend]
    D[Message Bus] -->|Provides results| CLI
    A -->|Protobuf| C[Worker Service]
    C -->|Emits results| D
    D --> E[Result Processing Service]
    style CLI fill:#fff,stroke:#000,stroke-width:2px
    style A fill:#fff,stroke:#000,stroke-width:2px
    style C fill:#fff,stroke:#000,stroke-width:2px
    style D fill:#fff,stroke:#000,stroke-width:2px
    style E fill:#fff,stroke:#000,stroke-width:2px

I've chopped out the IoT service because it is out of scope, but it can be added later.

stan-dot commented 1 week ago

yeah that's good, one place that could get more detail is message bus -> provides results

There are raw results there, and additional subscription from the CLI client to get the results processed?

unless the result processing service sends back processed results to the message bus. If so , I'd split the message bus into 2 channels - one for raw and the other for processed data

callumforrester commented 1 week ago

That's left vague on purpose because it is a case-by-case thing. In reality there will be many downstream services that do things to the data, and that is out-of-scope for this project, our job is just to provide data in a nice, well-known format (a.k.a. event-model)

I think we're beginning to get a design out of this thing though, which is good. Another important detail is hot reloading. We've found this to be a very useful feature on current blueapi. When we change a plan, we just need to poke an endpoint to reload the code into blueapi (~5s), where before we had to restart the pod (~30s). We get this via the subprocess arrangement at the moment and are keen not to lose it.

stan-dot commented 1 week ago

I mean we wouldn't lose that the logic would just be carried over like everything else.