Closed deinman closed 1 month ago
We have exactly the same issue, but did manage to find a temporary workaround. I hope this helps you too:
var firstHitForBucket = topHitsAggregate.Hits.Hits.First();
// There is an issue where Hit<T> is not typed correctly and returned as a Hit<object?> of type Hit<JsonElement>.
// For now, we parse the json ourselves.
// See: https://github.com/elastic/elasticsearch-net/issues/8359
if (firstHitForBucket.Source is not JsonElement json) continue;
// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = json.Deserialize<TDocument>(serializer.SerializerOptions);
We get the serializer from ElasticsearchClient.ElasticsearchClientSettings.SourceSerializer
. By default the options are not exposed.
If you don't use a custom serializer you can get them from DefaultSourceSerializer.CreateDefaultJsonSerializerOptions()
.
In our case we use a custom serializer in which we expose the settings ourselves:
public class CustomJsonSerializer : SystemTextJsonSerializer
{
private readonly JsonSerializerOptions _options;
public CustomJsonSerializer(IElasticsearchClientSettings settings) : base(settings)
{
// This is necessary for backwards compat
_options = DefaultSourceSerializer.CreateDefaultJsonSerializerOptions();
var jsonStringEnumConverter = _options.Converters.OfType<JsonStringEnumConverter>().First();
_options.Converters.Remove(jsonStringEnumConverter);
}
protected override JsonSerializerOptions CreateJsonSerializerOptions() => _options;
public JsonSerializerOptions SerializerOptions => _options;
}
// We use our own custom serializer, which we can abuse to expose and reuse internal serializer settings
var serializer = _elasticClient.ElasticsearchClientSettings.SourceSerializer as CustomJsonSerializer;
Hi @deinman, this behavior is not specific to sub-aggregations, but always the case for TopHitsAggregation
.
In our specification, this type is currently modelled like this:
/** @variant name=top_hits */
export class TopHitsAggregate extends AggregateBase {
hits: HitsMetadata<UserDefinedValue>
}
UserDefinedValue
is equivalent to untyped/object
.
In order to deserialize to the correct type, we would have to push down the generic type argument TDocument
to the TopHitsAggregation
type. I will check the implications with the rest of the team and come back to you when we came to a conclusion.
Thanks for providing a workaround @christiaanderidder !
It was just mentioned to me that top hits can as well be used to get top nested documents, in which case TDocument
is not correct:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_top_hits_support_in_a_nested_or_reverse_nested_aggregator
As this behavior is dependant on a runtime-value (mappings), we have to return object
s.
Ah, that's a bummer - but understandable given the different uses of TopHitsAggregation. Thanks for looking into it @flobernd and thanks for the workaround @christiaanderidder, I'd come up with basically the same thing - but it was great to know that someone else thought it up and implemented it as well!
I'll see if I can provide an easier way of accessing the SourceSerializer 🙂
@deinman @christiaanderidder I just created a PR in the transport library that adds a few extension methods which can be used to further (de-)serialize arbitrary data (e.g. JsonElement
) in an efficient way.
Ho @deinman @christiaanderidder, I have now created this follow up PR to include these changes in the actual client.
This will potentially be a breaking change for you as I had to change the inheritance hierarchy for the DefaultSourceSerializer
as well as the methods of the base class itself.
The custom serializer implementation is no longer required. Instead, you can just use:
if (firstHitForBucket.Source is not JsonElement json) continue;
// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = _elasticClient.ElasticsearchClientSettings.SourceSerializer.Deserialize<TDocument>(json);
Make sure to add:
using Elastic.Transport.Extensions;
Elastic.Clients.Elasticsearch version: 8.15.6
Elasticsearch version: 8.13.0
.NET runtime version: 8.0.8
Operating system version: Win 11
Description of the problem including expected versus actual behavior: Passing a sub aggregate causes the resultant aggregate hits to not deserialize.
Steps to reproduce:
Expected behavior I'd expect the resultant Hits inside the TopHitsAggregate to be typed as Model. If there's somewhere else I need to define the TDocument other than the SearchRequest - I can't find it.
I've confirmed the data in the nested Hits has the expected information, it's just coming back as an object.
As a work-around, I'm able to just deserialize it in some code after it comes back, which is workable, but it seems like there should be a way to get that TDocument to apply so users don't have to manually deserialize the result.
Provide
ConnectionSettings
(if relevant): N/AProvide
DebugInformation
(if relevant): N/A