It was pointed out to me recently (by @dylan-apollo and @lennyburdette) that JSONSelection strings sometimes become unnecessarily verbose when you want to include one or more fields from a higher level of an input object, along with multiple fields extracted from some deeply nested path within that object.
For example, the response shape returned by the OpenAI /chat/completions endpoint buries the actual response text from the LLM within an array of choices, containing objects with a message property that provides the role and content of the response, where the content is the desired response from the LLM (irrelevant properties omitted for clarity):
{
"id": "chatcmpl-123",
"created": 1677652288,
"model": "gpt-4o-mini",
"choices": [{
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
}]
}
With the current JSONSelection syntax, if you wanted to select the id, created, and model fields while also flattening the role and content fields, you'd have to use a selection like
id
created
model
role: choices->first.message.role
content: choices->first.message.content
In this example, the repetition of the choices->first.message. syntax feels somehow unnecessary, and it gets even worse if those properties are more deeply nested, and/or there are more than two of them.
To build an intuition for the solution I'm about to propose, remember that you don't need this repetition if you're only selecting the deeply nested properties (not id or created or model), because you can use a single top-level path selection that selects the role and content properties (as an object) from the choices->first.message path:
choices->first.message { role content }
Wouldn't it be nice if you could keep using that syntax, and just add other fields to it as necessary?
id
created
model
choices->first.message { role content }
In terms of the JSONSelection grammar, this new syntax is available and unambiguous because the grammar currently requires adding an alias to the path selection if you want to use it together with other named selections:
id
created
model
roleAndContent: choices->first.message { role content }
This is not quite what we want in this case, because roleAndContent becomes the only sibling key to id, created and model, and there's no syntax for merging role and content as siblings at that level. However, the absence of a syntax for that merging behavior gives us the freedom to choose the syntax we want to use.
Of course, the desired syntax only works when choices->first.message evaluates to an object, but we can enforce that expectation by requiring the trailing { role content } subselection, which produces a runtime ApplyToError if choices->first.message is not an object or does not have those properties (which is the best we can do, since the actual runtime value of choices->first.message depends on dynamic input data).
In other words, the following syntax should continue to be forbidden, even if we happen to know choices->first.message is an object with only the role and content fields:
id
created
model
# STILL ILLEGAL if there are any other named properties!
choices->first.message
and continues to contribute the role and content properties in the same way even when you add/remove other named sibling properties to/from the selection (such as id, created, and model), which can happen any time your GraphQL query happens not to select those properties.
We considered requiring some sort of syntax to indicate the merging behavior, such as ... choices->first.message { role content }, but having to add or remove the ... syntax according to the presence or absence of other named selections seemed burdensome and inconsistent. Instead, I think this PR gives a natural meaning to path selections that do not have an alias but do have a trailing subselection, which were previously allowed only at the top-level, by themselves, and forbidden within any nested SubSelection.
The first commit in the PR refactors the singular NamedSelection::name method to a plural NamedSelection::names method that can now return multiple names. I would appreciate @nicholascioli and @pubmodmatt's eyes on that commit especially, since it affects code outside the json_selection directory that they're familiar with. Those changes look more drastic than they are because I needed to reindent some existing code.
[ ] events - Stress test for events with a lot of users and deduplication ENABLED
[ ] events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
[ ] events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
[ ] events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
[ ] events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
[ ] events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
It was pointed out to me recently (by @dylan-apollo and @lennyburdette) that JSONSelection strings sometimes become unnecessarily verbose when you want to include one or more fields from a higher level of an input object, along with multiple fields extracted from some deeply nested path within that object.
For example, the response shape returned by the OpenAI
/chat/completions
endpoint buries the actual response text from the LLM within an array ofchoices
, containing objects with amessage
property that provides therole
andcontent
of the response, where thecontent
is the desired response from the LLM (irrelevant properties omitted for clarity):With the current JSONSelection syntax, if you wanted to select the
id
,created
, andmodel
fields while also flattening therole
andcontent
fields, you'd have to use a selection likeIn this example, the repetition of the
choices->first.message.
syntax feels somehow unnecessary, and it gets even worse if those properties are more deeply nested, and/or there are more than two of them.To build an intuition for the solution I'm about to propose, remember that you don't need this repetition if you're only selecting the deeply nested properties (not
id
orcreated
ormodel
), because you can use a single top-level path selection that selects therole
andcontent
properties (as an object) from thechoices->first.message
path:Wouldn't it be nice if you could keep using that syntax, and just add other fields to it as necessary?
In terms of the JSONSelection grammar, this new syntax is available and unambiguous because the grammar currently requires adding an alias to the path selection if you want to use it together with other named selections:
This is not quite what we want in this case, because
roleAndContent
becomes the only sibling key toid
,created
andmodel
, and there's no syntax for mergingrole
andcontent
as siblings at that level. However, the absence of a syntax for that merging behavior gives us the freedom to choose the syntax we want to use.Of course, the desired syntax only works when
choices->first.message
evaluates to an object, but we can enforce that expectation by requiring the trailing{ role content }
subselection, which produces a runtimeApplyToError
ifchoices->first.message
is not an object or does not have those properties (which is the best we can do, since the actual runtime value ofchoices->first.message
depends on dynamic input data).In other words, the following syntax should continue to be forbidden, even if we happen to know
choices->first.message
is an object with only therole
andcontent
fields:This new syntax respects guiding principle #3 (safe subsetting), because a selection consisting of only
is functionally equivalent to the selection
and continues to contribute the
role
andcontent
properties in the same way even when you add/remove other named sibling properties to/from the selection (such asid
,created
, andmodel
), which can happen any time your GraphQL query happens not to select those properties.We considered requiring some sort of syntax to indicate the merging behavior, such as
... choices->first.message { role content }
, but having to add or remove the...
syntax according to the presence or absence of other named selections seemed burdensome and inconsistent. Instead, I think this PR gives a natural meaning to path selections that do not have an alias but do have a trailing subselection, which were previously allowed only at the top-level, by themselves, and forbidden within any nestedSubSelection
.The first commit in the PR refactors the singular
NamedSelection::name
method to a pluralNamedSelection::names
method that can now return multiple names. I would appreciate @nicholascioli and @pubmodmatt's eyes on that commit especially, since it affects code outside thejson_selection
directory that they're familiar with. Those changes look more drastic than they are because I needed to reindent some existing code.