elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
14 stars 1.15k forks source link

Sub-Aggregate Not Deserializing #8359

Closed deinman closed 1 month ago

deinman commented 2 months ago

Elastic.Clients.Elasticsearch version: 8.15.6

Elasticsearch version: 8.13.0

.NET runtime version: 8.0.8

Operating system version: Win 11

Description of the problem including expected versus actual behavior: Passing a sub aggregate causes the resultant aggregate hits to not deserialize.

Steps to reproduce:

var terms = Aggregation.Terms(new TermsAggregation
{
    Field = Infer.Field<Model>(x => x.GroupId),
});

terms.Aggregations = new Dictionary<string, Aggregation>
{
    {
        "earliest_event", Aggregation.TopHits(new TopHitsAggregation
        {
            Sort = new List<SortOptions>
            {
                SortOptions.Field(Infer.Field<Model>(x => x.StartTime),
                    new FieldSort { Order = SortOrder.Asc })
            },
            Size = 1
        })
    }
};

var aggregations = new Dictionary<string, Aggregation> { { "events", terms } };

var request = new SearchRequest<Model>
{
    Aggregations = aggregations,
    Query = Query.MatchAll(new MatchAllQuery())
};

var result = await client.SearchAsync<Model>(request);
public class Model
{
    public Guid Id { get; set; }
    public DateTimeOffset StartTime { get; set; }
    public Guid? RecurrenceId { get; set; }
    public Guid GroupId => RecurrenceId ?? Id;
}

Image

Expected behavior I'd expect the resultant Hits inside the TopHitsAggregate to be typed as Model. If there's somewhere else I need to define the TDocument other than the SearchRequest - I can't find it.

I've confirmed the data in the nested Hits has the expected information, it's just coming back as an object.

As a work-around, I'm able to just deserialize it in some code after it comes back, which is workable, but it seems like there should be a way to get that TDocument to apply so users don't have to manually deserialize the result.

Provide ConnectionSettings (if relevant): N/A

Provide DebugInformation (if relevant): N/A

christiaanderidder commented 2 months ago

We have exactly the same issue, but did manage to find a temporary workaround. I hope this helps you too:

var firstHitForBucket = topHitsAggregate.Hits.Hits.First();

// There is an issue where Hit<T> is not typed correctly and returned as a Hit<object?> of type Hit<JsonElement>.
// For now, we parse the json ourselves.
// See: https://github.com/elastic/elasticsearch-net/issues/8359
if (firstHitForBucket.Source is not JsonElement json) continue;

// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = json.Deserialize<TDocument>(serializer.SerializerOptions);

We get the serializer from ElasticsearchClient.ElasticsearchClientSettings.SourceSerializer. By default the options are not exposed. If you don't use a custom serializer you can get them from DefaultSourceSerializer.CreateDefaultJsonSerializerOptions(). In our case we use a custom serializer in which we expose the settings ourselves:

public class CustomJsonSerializer : SystemTextJsonSerializer 
{
    private readonly JsonSerializerOptions _options;

    public CustomJsonSerializer(IElasticsearchClientSettings settings) : base(settings)
    {
        // This is necessary for backwards compat
        _options = DefaultSourceSerializer.CreateDefaultJsonSerializerOptions();
        var jsonStringEnumConverter = _options.Converters.OfType<JsonStringEnumConverter>().First();
        _options.Converters.Remove(jsonStringEnumConverter);
    }

    protected override JsonSerializerOptions CreateJsonSerializerOptions() => _options;

    public JsonSerializerOptions SerializerOptions => _options;
}
// We use our own custom serializer, which we can abuse to expose and reuse internal serializer settings
var serializer = _elasticClient.ElasticsearchClientSettings.SourceSerializer as CustomJsonSerializer;
flobernd commented 1 month ago

Hi @deinman, this behavior is not specific to sub-aggregations, but always the case for TopHitsAggregation.

In our specification, this type is currently modelled like this:

/** @variant name=top_hits */
export class TopHitsAggregate extends AggregateBase {
  hits: HitsMetadata<UserDefinedValue>
}

UserDefinedValue is equivalent to untyped/object.

In order to deserialize to the correct type, we would have to push down the generic type argument TDocument to the TopHitsAggregation type. I will check the implications with the rest of the team and come back to you when we came to a conclusion.

Thanks for providing a workaround @christiaanderidder !

flobernd commented 1 month ago

It was just mentioned to me that top hits can as well be used to get top nested documents, in which case TDocument is not correct: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_top_hits_support_in_a_nested_or_reverse_nested_aggregator

As this behavior is dependant on a runtime-value (mappings), we have to return objects.

deinman commented 1 month ago

Ah, that's a bummer - but understandable given the different uses of TopHitsAggregation. Thanks for looking into it @flobernd and thanks for the workaround @christiaanderidder, I'd come up with basically the same thing - but it was great to know that someone else thought it up and implemented it as well!

flobernd commented 1 month ago

I'll see if I can provide an easier way of accessing the SourceSerializer 🙂

flobernd commented 1 month ago

@deinman @christiaanderidder I just created a PR in the transport library that adds a few extension methods which can be used to further (de-)serialize arbitrary data (e.g. JsonElement) in an efficient way.

flobernd commented 1 month ago

Ho @deinman @christiaanderidder, I have now created this follow up PR to include these changes in the actual client.

This will potentially be a breaking change for you as I had to change the inheritance hierarchy for the DefaultSourceSerializer as well as the methods of the base class itself.

The custom serializer implementation is no longer required. Instead, you can just use:

if (firstHitForBucket.Source is not JsonElement json) continue;

// Make sure to deserialize the JsonElement with the same options as the regular elastic document deserializer.
var document = _elasticClient.ElasticsearchClientSettings.SourceSerializer.Deserialize<TDocument>(json);

Make sure to add:

using Elastic.Transport.Extensions;