apollographql / router

A configurable, high-performance routing runtime for Apollo Federation 🚀
https://www.apollographql.com/docs/router/
Other
813 stars 271 forks source link

[JSONSelection] Allow `NamedSelection ::= Path SubSelection | ...` without an `Alias` for flattening multiple deeply nested properties #6076

Closed benjamn closed 1 month ago

benjamn commented 1 month ago

It was pointed out to me recently (by @dylan-apollo and @lennyburdette) that JSONSelection strings sometimes become unnecessarily verbose when you want to include one or more fields from a higher level of an input object, along with multiple fields extracted from some deeply nested path within that object.

For example, the response shape returned by the OpenAI /chat/completions endpoint buries the actual response text from the LLM within an array of choices, containing objects with a message property that provides the role and content of the response, where the content is the desired response from the LLM (irrelevant properties omitted for clarity):

{
  "id": "chatcmpl-123",
  "created": 1677652288,
  "model": "gpt-4o-mini",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
  }]
}

With the current JSONSelection syntax, if you wanted to select the id, created, and model fields while also flattening the role and content fields, you'd have to use a selection like

id
created
model
role: choices->first.message.role
content: choices->first.message.content

In this example, the repetition of the choices->first.message. syntax feels somehow unnecessary, and it gets even worse if those properties are more deeply nested, and/or there are more than two of them.

To build an intuition for the solution I'm about to propose, remember that you don't need this repetition if you're only selecting the deeply nested properties (not id or created or model), because you can use a single top-level path selection that selects the role and content properties (as an object) from the choices->first.message path:

choices->first.message { role content }

Wouldn't it be nice if you could keep using that syntax, and just add other fields to it as necessary?

id
created
model
choices->first.message { role content }

In terms of the JSONSelection grammar, this new syntax is available and unambiguous because the grammar currently requires adding an alias to the path selection if you want to use it together with other named selections:

id
created
model
roleAndContent: choices->first.message { role content }

This is not quite what we want in this case, because roleAndContent becomes the only sibling key to id, created and model, and there's no syntax for merging role and content as siblings at that level. However, the absence of a syntax for that merging behavior gives us the freedom to choose the syntax we want to use.

Of course, the desired syntax only works when choices->first.message evaluates to an object, but we can enforce that expectation by requiring the trailing { role content } subselection, which produces a runtime ApplyToError if choices->first.message is not an object or does not have those properties (which is the best we can do, since the actual runtime value of choices->first.message depends on dynamic input data).

In other words, the following syntax should continue to be forbidden, even if we happen to know choices->first.message is an object with only the role and content fields:

id
created
model
 # STILL ILLEGAL if there are any other named properties!
choices->first.message

This new syntax respects guiding principle #3 (safe subsetting), because a selection consisting of only

choices->first.message { role content }

is functionally equivalent to the selection

role: choices->first.message.role
content: choices->first.message.content

and continues to contribute the role and content properties in the same way even when you add/remove other named sibling properties to/from the selection (such as id, created, and model), which can happen any time your GraphQL query happens not to select those properties.

We considered requiring some sort of syntax to indicate the merging behavior, such as ... choices->first.message { role content }, but having to add or remove the ... syntax according to the presence or absence of other named selections seemed burdensome and inconsistent. Instead, I think this PR gives a natural meaning to path selections that do not have an alias but do have a trailing subselection, which were previously allowed only at the top-level, by themselves, and forbidden within any nested SubSelection.

The first commit in the PR refactors the singular NamedSelection::name method to a plural NamedSelection::names method that can now return multiple names. I would appreciate @nicholascioli and @pubmodmatt's eyes on that commit especially, since it affects code outside the json_selection directory that they're familiar with. Those changes look more drastic than they are because I needed to reindent some existing code.

router-perf[bot] commented 1 month ago

CI performance tests