API changes for ServiceRef/batch operations

ebaron commented 3 years ago

Let's discuss how to modify the HTTP API to support some new use-cases:

Syncing recordings with operator (https://github.com/cryostatio/cryostat-operator/issues/205)
Operations on Deployments/ReplicaSets/Containers (related: https://github.com/cryostatio/cryostat/issues/421)

In both cases, we've determined that the target ID being a JMX service URL is insufficient. For the operator syncing case, it's not possible to efficiently map a JMX service URL to a Pod. For the operations on Deployments/ReplicaSets, there would be multiple JMX service URLs per object. We need some other way to refer to a Deployment/ReplicaSet in the HTTP API.

andrewazores commented 3 years ago

As it is, the Operator does its own independent discovery of targets, right? But the implementation mirrors the way Cryostat does it, so the FlightRecorder objects and the Cryostat API targets list happen to agree. If I understand the syncing and ID issue correctly, having Cryostat assign a unique ID that it can map back to a Deployment/Pod/Container still doesn't help the Operator, because the Operator wouldn't know about those IDs - it just knows ex. the Deployment's name, labels, the namespace it's in, etc.

Is there some way we can reconcile this difference? Perhaps if we can come up with a way to generate the IDs from the resource type and some specific fields, then both the Operator and Cryostat can agree on the IDs as well, without the Operator having to query Cryostat for target discovery and then maintain that result as internal state for mapping back any new/changed CRs. Does that make sense? ex. an ID for a Deployment could look like deployment-${hash(deployment.namespace, deployment.name)}.

This still means that both the Operator and Cryostat need to know how to map those IDs back to actual Deployments, and similar for other target/resource types, but it eliminates a round trip and some internal state maintenance that the Operator would have to do to get the ID from Cryostat. These IDs could still be passed to generic-looking endpoints like POST /api/v2/targets/:targetId/recordings, without the endpoint path being specific to the target type, which is more flexible when the runtime platform isn't necessarily Kubernetes/OpenShift.

ebaron commented 3 years ago

Perhaps we could use literals instead of a hash. Then it's trivial to map the IDs back to their objects. We can use underscores as separators as they're not valid in Kubernetes names [1].

Something like: [kind]_[namespace]_[name](_containerName)?

We could make the container name optional if we need to refer to a specific container. The container name is mandatory and must be unique within the Pod. [2]

[1] https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-subdomain-names [2] https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#container-v1-core

andrewazores commented 3 years ago

Hmm. I'm not sure how much I like the optional containerName portion - this would only seem to make sense to append when the kind is Pod (or equivalent), and not for anything larger like Deployment or even a whole Namespace. But on the other hand, container_$namespace_$containerName doesn't necessarily make sense as a target ID either, because this doesn't indicate a specific container but rather any container with the same name in the namespace. Maybe the containerName in particular is better suited as some query parameter optionally included when the ID's kind is Pod? But maybe not - it would look like POST /api/v2/targets/pod_somenamespace_mypod/recordings?containerName=specificContainer, which seems a bit strange.

Using literal properties of the resource as the ID could work but I think we would need to be careful about how we define the hierarchical structure of which properties end up in the ID. It seems like this could easily end up where the API handler needs to know details about each kind and what its IDs/hierarchical paths look like, and then this gets multiplied by the different names that different runtime platforms could use for similar concepts.

andrewazores commented 3 years ago

What if we remove the targetId from the endpoint path altogether, and use selectors instead? For GETs this could be query parameters and for POST it would be form attributes. I'm not sure what to do about PATCH/PUT endpoints - maybe they need to be redefine as POSTs.

GET /api/v2/targets?namespace=myproject&deployment=loginservice&container=login-svc-vertx

POST /api/v2/targets/recordings
Form Attributes:
namespace: myproject
deployment: loginservice
container: login-svc-vertx
recordingName: login-profiling
template: Profiling
duration: 0

These two batch requests apply to any login-svc-vertx container in any Pod, within the loginservice Deployment in myproject. The GET obviously supplies more information about these, so the response is a list of details about these containers. The POST starts recordings with identical settings on each of the matched containers.

I need to think about how an API handler for such requests would deal with knowing about the different query parameter or form attribute names that might be present for different runtime platforms, since it's not necessarily limited to just namespace/deployment/pod/container.

ebaron commented 3 years ago

What if we remove the targetId from the endpoint path altogether, and use selectors instead? For GETs this could be query parameters and for POST it would be form attributes. I'm not sure what to do about PATCH/PUT endpoints - maybe they need to be redefine as POSTs.

That approach should work. We would have to figure out the semantics for all the different parameter combinations. What do we do if namespace isn't provided? What about if only container is given, do we check every Pod?

I need to think about how an API handler for such requests would deal with knowing about the different query parameter or form attribute names that might be present for different runtime platforms, since it's not necessarily limited to just namespace/deployment/pod/container

Maybe some PlatformClient method that takes in the query parameters and returns a set of JMX service URLs?

andrewazores commented 3 years ago

What if we remove the targetId from the endpoint path altogether, and use selectors instead? For GETs this could be query parameters and for POST it would be form attributes. I'm not sure what to do about PATCH/PUT endpoints - maybe they need to be redefine as POSTs.

That approach should work. We would have to figure out the semantics for all the different parameter combinations. What do we do if namespace isn't provided? What about if only container is given, do we check every Pod?

I'm picturing it as simply matching every container/target where all of the given selectors apply, so GET /api/v2/targets?container=login-svc-vertx returns a set of all such containers in any pod, in any deployment, in any namespace. Likewise a POST containing only recordingName/template/duration and a container without a namespace/deployment/pod would start recordings on any matching container anywhere.

My hope is that #492 will allow Cryostat to build an initial picture of the whole deployment scene by querying the platform API, and then use watches whenever applicable to update that scene incrementally as changes occur. This way the cost of doing batch operations is much lower because we can just traverse/filter the in-memory model, rather than having to call out to the platform API every time (likely multiple times per operation).

I need to think about how an API handler for such requests would deal with knowing about the different query parameter or form attribute names that might be present for different runtime platforms, since it's not necessarily limited to just namespace/deployment/pod/container

Maybe some PlatformClient method that takes in the query parameters and returns a set of JMX service URLs?

Yea, that's an option. Certain form attributes would be known to pertain to the recording settings by name, or maybe they would share some common prefix, and all of the other attributes would be assumed to be platform-specific selectors. So that submap could be taken out and passed to the PlatformClient to then do the traversal/mapping/filtering and determine what set of actual ServiceRefs this request pertains to.

jan-law commented 3 years ago

I’d like to help with the v2 batch operations once I’m finished adding integration tests. When you’re planning features, could you let me know what I could work on?

andrewazores commented 3 years ago

Sure, will do.

andrewazores commented 3 years ago

Just brainstorming and doing some light research/reading here, trying to think of alternatives to implementing a bunch of query parameters and special handling for each one.

Could we use something more standardized and powerful, like GraphQL? From what I've read that might be overkill and difficult to adapt to the existing tree discovery unless we take it apart into individual pieces that know how to query the different kinds of Kubernetes resource (an API endpoint to query k8s Endpoints, another for Pods, another for Deployments, etc. - multiplied by API endpoints for each deployment platform other than Kubernetes).

Or perhaps we can find a way to use JSONPath for the query expressions rather than rolling our own limited query parameter syntax? This sort of implies that we build the proper tree up in memory as usual and then serialize it to JSON before performing the path query. If we're just going to return that result back to the client then fine, but if we're supposed to take that result and perform some action (ex. start a recording on each matching target in the queried tree) then we would need to deserialize the JSON again or otherwise cross-reference the memory model against the JSON tree, so that's a terrible mess. Otherwise, we could try to use the JSONPath expression syntax and apply it directly to our in-memory model (either after the fact by creating the whole tree and then using the path expression to trim it, or up front by parsing the path expression to create some kind of matcher functions and using that to exclude branches from the tree as we build it).

There is also OData: https://www.odata.org/ + https://olingo.apache.org/

And, what looks to be very promising and easily applicable to our use-case, QueryDSL: https://github.com/querydsl/querydsl/tree/master/querydsl-collections

andrewazores commented 3 years ago

Another pretty simple idea: reuse Nashorn and do something very similar to the Automated Rules matchExpression. I'm not entirely sure yet how we would want to define an expression for filtering in a way that traverses the discovery tree however, and such an expression might end up being a little ridiculous to encode into a single query parameter, but it's another idea to experiment with and avoids pulling in any new dependencies.

andrewazores commented 2 years ago

I'm leaning toward GraphQL right now, or else just roll our own thing reusing matchExpression - but I don't like that as much.

Here's a Spring Boot-oriented GraphQL example that gives some good insight into how it could generally work for us:

https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/

And there is a Vert.x plugin for it as well:

https://vertx.io/docs/3.9.0/vertx-web-graphql/java/

Here's another walkthrough, this time using Vert.x:

https://tsegismont.github.io/graphql-api-gateway-workshop/

This might be needed in order to support Map-type fields in response objects (for things like annotations and labels):

https://github.com/graphql-java/graphql-java-extended-scalars

And the general graphql-java Getting Started:

https://www.graphql-java.com/documentation/getting-started

We would need to write a few of these DataFetchers but they're each pretty small and I think shouldn't be too bad to write. For performance reasons we might want to keep an in-memory model of the whole Kubernetes deployment cached for some time and periodically refresh it and the DataFetcher implementations could be doing their searching and filtering against that, but that's something to evaluate when we actually have a prototype working. The Book/Author example given in the Spring tutorial looks like it should map pretty well to Endpoints/Pod/Deployment for us, I think.

The first prototype can just have a DataFetcher that only knows how to query custom targets as well, since those are easy to control for testing and it's a flat traversal. Once that's working then we can expand the prototype to also query the JDP discovery side within Podman. Once it's working for both of those together then it shouldn't be too big a bridge to gap to also implement querying on the k8s API side.

andrewazores commented 2 years ago

One question I still have about GraphQL is how we would want to turn this into performing some kind of action (starting a recording, taking a snapshot on all the targets and archiving it, etc). The standard GraphQL definition is definitely geared toward, well, Querying, to get information about API resources. Perhaps we could create the one GraphQL querying endpoint, and another endpoint that expects a list of targetIds and some kind of description of the action to be performed, and then performs that action against all the targets. Then all batch operations would require exactly two API calls to complete - one to query for existing targets at the present moment that fit some search criteria, and one to execute some action against all such targets.

Or, can we use the GraphQL Mutations concept for this? It seems like it may fit.

andrewazores commented 2 years ago

Just opened #783 for prototyping with GraphQL. It was pretty easy to set up and get started with so far. I'll replace the static book/author data with querying for custom targets next and see how that goes.

andrewazores commented 2 years ago

@ebaron before I get too deep into this, could you take a look at that draft PR and maybe some of the accompanying links I posted a couple of comments back and let me know your thoughts on using GraphQL for this feature?

andrewazores commented 2 years ago

Closed by #783

cryostatio / cryostat-legacy

API changes for ServiceRef/batch operations #495