bluesky / scanspec

Specify step and flyscan paths in a serializable, efficient and Pythonic way
Apache License 2.0
7 stars 2 forks source link

GraphQL API #134

Open callumforrester opened 1 month ago

callumforrester commented 1 month ago

With the arrival of the supergraph at Diamond, we should once again consider having a scanspec GraphQL API.

We had one a long time ago which was converted to REST. The last version of the GraphQL API can be found here: https://github.com/bluesky/scanspec/blob/0.5.5/src/scanspec/service.py. Both APIs provided scanspec validation and point generation.

Requirements

Functional (User Stories)

As a user I would like a GUI where I can define my scan trajectory to achieve parity with GDA. As a user I would like a GUI where I can preview my scan trajectory to achieve parity with GDA.

Non-Functional

A single API to serve these GUIs.

Considerations

  1. Should we have a GraphQL API as well as or instead of a REST one? I would vote yes for simplicity's sake but given that this repository is in the bluesky organisation I think it's a discussion worth having.
  2. Should we use the same GraphQL library we used before? We used graphql-server before, but it was not especially well-maintained and I don't believe it was very performant. We could see if any other solutions have gained prominence in the meantime. Coniql uses strawberry-graphql so we should see how well it was worked for them. Tagging @coretl and @MichaelStubbings.
  3. How are we going to judge the performance, reliability, security etc. of the eventual solution to determine when it is ready to be added to the graph.
  4. How should we handle rate/size-limiting?

Notes on the Original API

The original API was stateless and was not backed by any persistence mechanism. It took scanspecs in requests and either validated them or generated points and shipped them back to the caller in the format of their choosing (options were a truncated string representation of array for debugging, a JSON float list representation of the array or base64 encoded string representation of array).

Original schema (return values only): image

By default, the API would only return up to 100,000 points, it was possible to ask it to return more in the request or turn off the limit entirely. In cases where it generated more points than it could return, it would sub-sample, for example: image

This is the area we will have to think about most carefully, the service could be commanded to return an enormous number of points that might impair network performance. Just to show how easy it is, the following spec would do that Line(axis="x", start=0, stop=1.0, num=1e50). Even if the max points parameter holds, the service would still generate 10^50 points, which would take an unacceptably long time. When the API was designed these risks were deemed to be acceptable as the API would be internal-only, only used by specific clients that we could control and not critical to operations. If it becomes part of the supergraph then stricter constraints will be placed on it and we will have to solve these problems.

callumforrester commented 1 month ago

Also tagging @garryod and @DiamondJoseph for awareness

coretl commented 3 weeks ago

1. Should we have a GraphQL API as well as or instead of a REST one? I would vote yes for simplicity's sake but given that this repository is in the bluesky organisation I think it's a discussion worth having.

Depends how much effort it is to keep both. If it's easy then I vote keep both, and it makes this a nice example project for using pydantic models with both REST and GraphQL

2. Should we use the same GraphQL library we used before?

I think @garryod did some tests with strawberry too? I would probably vote for that as it seems to support pydantic (although not sure if our magic discriminators will work out of the box as strawberry unions are simpler)

3. How are we going to judge the performance, reliability, security etc. of the eventual solution to determine when it is ready to be added to the graph.

Good question. How are we going to judge these on any services to be added to the graph?

When the API was designed these risks were deemed to be acceptable as the API would be internal-only, only used by specific clients that we could control and not critical to operations. If it becomes part of the supergraph then stricter constraints will be placed on it and we will have to solve these problems.

Do we have to make it externally accessible? Surely doing bluesky scans is something we could require to be done via a VPN, and I can't see why we would need this GUI apart from when queuing up scans?

4. How should we handle rate/size-limiting?

Pagination solves all our problems right? It would at least solve the network bandwidth issue. The 1e50 internal calculation time issue could be largely avoided if we set a limit to the size of a single Spec, which is mostly funnelled through this convenient function: https://github.com/bluesky/scanspec/blob/c15b8c3abbd474466b35661980cb558141928e27/src/scanspec/specs.py#L421-L426

Checking num against an environment variable at the top of this function would allow us to error before using lots of memory and CPU.

Alternatively make lots of replicas and give them not much CPU or memory, and if they die they die.

callumforrester commented 3 weeks ago

All good suggestions.

Do we have to make it externally accessible?

Internal was the wrong word, my bad. In general I use the words external and internal in a relative sense (i.e. larger or smaller than the previously assumed scope) while most people use it in an absolute sense (i.e. on the internet or not on the internet), and confusion arises; this isn't the first time. I simply mean that putting it on the graph means that in theory anyone within Diamond may write clients for it, including clients that form other parts of the supergraph, where previously we imagined there would only be one or two clients and we would write them. I was making no reference at all to whether it becomes available on the public internet. Can you think of a better word for that?

However, with all that said, I think the plan is to make the graph available on the public internet eventually, @garryod can comment.

coretl commented 3 weeks ago

Internal was the wrong word, my bad. In general I use the words external and internal in a relative sense (i.e. larger or smaller than the previously assumed scope) while most people use it in an absolute sense (i.e. on the internet or not on the internet), and confusion arises; this isn't the first time. I simply mean that putting it on the graph means that in theory anyone within Diamond may write clients for it, including clients that form other parts of the supergraph, where previously we imagined there would only be one or two clients and we would write them. I was making no reference at all to whether it becomes available on the public internet. Can you think of a better word for that?

I can only think of longer words:

However, with all that said, I think the plan is to make the graph available on the public internet eventually, @garryod can comment.

This concerns me...

callumforrester commented 3 weeks ago

I think as a first step I will spin out a new issue for just restoring the GraphQL API alongside the REST API and using Strawberry instead of gql-core, that's more than enough for an actionable issue. I'll let the long-term discussion continue here though.