elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
8 stars 1.15k forks source link

ElasticSearch incorrecly serialize percolated queries #8003

Open KondzioSSJ4 opened 10 months ago

KondzioSSJ4 commented 10 months ago

Elastic.Clients.Elasticsearch version: 8.11 (latest)

Elasticsearch version: 8.11.3 (latest)

.NET runtime version: 6

Operating system version: Windows 11

Description of the problem including expected versus actual behavior: Percolated queries in Elastic Search doesn't serialized to correct value

For example simple query like:

Query.Terms(new TermsQuery()
{
    Field = Item.GetExpression(x => x.Name1),
    Terms = new TermsQueryField(new[] { FieldValue.String("value") })
})

when converted to percolated query in index it will provide:

"query":{"terms":{"name1":{}}}

What is incorrect, because it should converted to:

"query":{"terms":{"name1":["value"]}}

Steps to reproduce: Go to repository: https://bitbucket.org/KondzioSSJ4/es_percolatedserialization/src/meh/

And then just run any of this projects each of them returns error when try to index data

Expected behavior Percolated queries would correctly indexed in elastic

Provide ConnectionSettings (if relevant): localhost... any Docker would be helpfull

Provide DebugInformation (if relevant):

Invalid Elasticsearch response built from a unsuccessful (400) low level call on PUT: /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56
 Exception: Request failed to execute. Call: Status code 400 from: PUT /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56. ServerError: Type: document_parsing_exception Reason: "[1:79] failed to parse: Required [index, id, path]" CausedBy: "Type: illegal_argument_exception Reason: "Required [index, id, path]""

# Audit trail of this API call:
 - [1] BadResponse: Node: http://localhost:9200/ Took: 00:00:00.0266196
# OriginalException: Elastic.Transport.TransportException: Request failed to execute. Call: Status code 400 from: PUT /percolated-index/_doc/3df093da-1f93-487f-b2bc-5616a33f2a56. ServerError: Type: document_parsing_exception Reason: "[1:79] failed to parse: Required [index, id, path]" CausedBy: "Type: illegal_argument_exception Reason: "Required [index, id, path]""
# Request:
{"dingerId":"3df093da-1f93-487f-b2bc-5616a33f2a56","query":{"terms":{"name1":{}}}}
# Response:
{"error":{"root_cause":[{"type":"document_parsing_exception","reason":"[1:79] failed to parse: Required [index, id, path]"}],"type":"document_parsing_exception","reason":"[1:79] failed to parse: Required [index, id, path]","caused_by":{"type":"illegal_argument_exception","reason":"Required [index, id, path]"}},"status":400}
KondzioSSJ4 commented 10 months ago

My walkarround was to add the custom serializer with added converter that supports Union conversion correctly

alientourist commented 8 months ago

Hi @KondzioSSJ4 ! I am having exactly the same problem but have trouble understanding how to add a custom serializer that supports union conversion correctly. Any chance you could share code, point to some documentation or just explain your solution in a bit more detail? Would be very grateful! Thanks in advance!

KondzioSSJ4 commented 8 months ago

@alientourist

Yes, sure

Adding converter to context:

var pool = new SingleNodePool(new Uri(_esConfig.Uri));
var connection = new ElasticsearchClientSettings(pool, CreateSerializer)
 .Authentication(....)
 ...;

var client = new ElasticsearchClient(_settings);

where most important is this method:

private static Serializer CreateSerializer(Serializer originalSerializer, IElasticsearchClientSettings settings)
{
    return new DefaultSourceSerializer(settings, x =>
    {
        x.Converters.Add(new TermsQueryFieldConverter());
    });
}

And converter is like:

internal sealed class TermsQueryFieldConverter : JsonConverter<TermsQueryField>
{
    public override TermsQueryField? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        if (reader.TokenType == JsonTokenType.StartArray)
        {
            var fields = JsonSerializer.Deserialize<FieldValue[]>(ref reader, options);
            return new TermsQueryField(fields ?? Array.Empty<FieldValue>());
        }
        else
        {
            var lookup = JsonSerializer.Deserialize<TermsLookup>(ref reader, options);
            return new TermsQueryField(lookup ?? new TermsLookup());
        }
    }

    public override void Write(Utf8JsonWriter writer, TermsQueryField value, JsonSerializerOptions options)
    {
        if (value is null)
        {
            writer.WriteNullValue();
            return;
        }

        value.Match(
            c =>
            {
                JsonSerializer.Serialize(writer, c, options);
            },
            tl =>
            {
                JsonSerializer.Serialize(writer, tl, options);
            });
    }
}

The good idea would be adding such converter inside the repository to fix issue for other user of ElasticSearch But the "problem" is with way how it's good to add to repository (probably because that problem would be for all Union then they probably would like to add similar converter and attributes to class for every single class that implement Union and... I don't have time to find the way to make it :D)

alientourist commented 8 months ago

Thanks @KondzioSSJ4 ! Now I understand a bit more! Have started adding more converters for other types that we need...

IhnatKlimchuk commented 8 months ago

So I decided to spend some time digging into that in order to solve. I found 2 serializers used in the client: DefaultRequestResponseSerializer for serializing complex internal types and DefaultSourceSerializer for serializing customer payload. First has many custom converters that allows to serialize and deserialize union types and handle TermsQuery. Ironically, there is even unit test for that: test and expected result

By default requests are serialized with DefaultRequestResponseSerializer, but CustomJsonWriterConverter<TDocument> converter detects IndexRequest, IndexRequestDescriptor or CreateRequest by searching ICustomJsonWriter interface and calls Document serialization with DefaultSourceSerializer instead of DefaultRequestResponseSerializer, resulting in dropping support of internal converters.

I guess nobody initially thought that percolate query feature will add internal types as part of the index/create request. Unfortunately, DefaultRequestResponseSerializer is marked as internal similar to converters used inside, making it impossible to re-use. IndexRequest, IndexRequestDescriptor or CreateRequest are sealed and you can't override ICustomJsonWriter interface implementation. Only option for now looks like adding custom converters into DefaultSourceSerializer as @KondzioSSJ4 mentioned.

Possible resolutions: Quick: allow string/stream payload into IndexRequest, IndexRequestDescriptor or CreateRequest. There will be no need for struggle with DoRequestAsync, but allowing manual serialization. Long term: create ability to switch back to DefaultRequestResponseSerializer in case field type is Query either through yet another custom converter or by extending CustomJsonWriterConverter related logic, and avoid switching to DefaultSourceSerializer in the first place.

This requires owners decision here, so I can't provide a PR for now here as most likely it will be throw away work. Waiting for maintainers discussion and decision...

IhnatKlimchuk commented 7 months ago

Note: BoolQuery has MinimumShouldMatch as separate class that extends Union<int?, string> that means UnionConverter is missing in DefaultSourceSerializer

IhnatKlimchuk commented 7 months ago

In case there are too many issues with serializer and percolate query item is simple and has query and id only you can keep separate instance of ElasticsearchClient with next hack for settings setup:

var settings = new ElasticsearchClientSettings(...your config...);

var desiredSerializer = typeof(TransportConfigurationBase<ElasticsearchClientSettings>)
    .GetProperty("UseThisRequestResponseSerializer", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
    .GetValue(settings);
typeof(ElasticsearchClientSettingsBase<ElasticsearchClientSettings>)
    .GetField("_sourceSerializer", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
    .SetValue(settings, desiredSerializer);

This will allow to use same serializer for payload as for query in search, but you have to double check if payload serialized correctly.

flobernd commented 7 months ago

Hi there! After the big 8.13 release I finally have some time to look into the open issues 🙂

Another possible solution would be to expose a [RequestResponseConverter] attribute similar to the [SourceConverter] one to allow switching from source- to request/response-serialization. However, this won't work if the DefaultSourceSerializer is completely replaced by the user.

rustunooldu commented 1 month ago

Today I ran into the exact same issue. Any plans to get this fixed @flobernd ?

I understand it's not high priority, but I figured I'll bump it since there were other recent discussions related to Union type conversions.

Edit: Just to be clear, this is not limited to Union types. Percolation in general is broken if it contains anything more than a simple query. Union, MinimumShouldMatch, string enums such as TextQueryType are just a couple examples that don't serialize properly.

I think your idea of exposing a [RequestResponseConverter] attribute should work well enough. My current solution, inspired by the previous comments, was to wrap the percolator query in a container and write a custom converter for it.

DefaultRequestResponseSerializer is an internal class, but I noticed that it is actually exposed as a property of IElasticsearchClientSettings, so we don't need reflection or anything like that. We can simply use the existing serializer instance and pass it as a constructor argument to our custom converter. Here is the full code:

public class PercolateQueryContainer
{
    public required Query Query { get; set; }
}
public sealed class PercolateQueryContainerConverter(Serializer requestResponseSerializer) : JsonConverter<PercolateQueryContainer>
{
    public override PercolateQueryContainer? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        if (reader.TokenType is JsonTokenType.None or JsonTokenType.Null)
        {
            return null;
        }

        var jsonNode = JsonNode.Parse(ref reader);
        if (jsonNode == null)
        {
            return null;
        }

        var jsonString = jsonNode.ToJsonString(options);
        using var jsonStream = new MemoryStream(Encoding.UTF8.GetBytes(jsonString));
        return requestResponseSerializer.Deserialize<PercolateQueryContainer>(jsonStream);
    }

    public override void Write(Utf8JsonWriter writer, PercolateQueryContainer value, JsonSerializerOptions options)
    {
        var jsonString = requestResponseSerializer.SerializeToString(value, SerializationFormatting.None);
        writer.WriteRawValue(jsonString, true);
    }
}

Usage:

var pool = new SingleNodePool(new Uri($"{clientConfig.Host}:{clientConfig.Port}"));
var settings = new ElasticsearchClientSettings(pool, GetSourceSerializer)
    ...;
var client = new ElasticsearchClient(settings);
private static Serializer GetSourceSerializer(Serializer builtIn, IElasticsearchClientSettings settings)
{
    return new DefaultSourceSerializer(
        settings,
        options =>
        {
            options.Converters.Add(new PercolateQueryContainerConverter(settings.RequestResponseSerializer));
        });
}

So far this seems to work without any issues, but it's obviously a lot of extra code and unnecessary complexity for something that should simply work out of the box.