elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
13 stars 1.15k forks source link

Wrong serialization of TermsQuery, MinimumShouldMatch and TrackTotalHits #8387

Closed andersosthus closed 1 month ago

andersosthus commented 1 month ago

Elastic.Clients.Elasticsearch version: 8.15.9

Elasticsearch version: N/A

.NET runtime version: 8.0

Operating system version: Win 11

Description of the problem including expected versus actual behavior: Term in TermsQuery does not serialize properly. MinimumShouldMatch in BoolQuery does not serialize properly. TrackTotalHits in SearchRequest does not serialize properly.

Steps to reproduce:

using System.Text;
using Elastic.Clients.Elasticsearch;
using Elastic.Clients.Elasticsearch.Core.Search;
using Elastic.Clients.Elasticsearch.QueryDsl;
using Elastic.Transport;

var client = new ElasticsearchClient();

var request = new SearchRequest("indexName")
{
    Query = new BoolQuery
    {
        Should =
        [
            new TermsQuery
            {
                Field = "abcd",
                Term = new TermsQueryField(new FieldValue[]{"a", "b"}),
            },
        ],
        MinimumShouldMatch = MinimumShouldMatch.Fixed(1)
    },
    TrackTotalHits = new TrackHits(false)
};

using var ms = new MemoryStream();
client.Transport.Configuration.SourceSerializer.Serialize(request, ms, SerializationFormatting.Indented);
var jsonPayload = Encoding.UTF8.GetString(ms.ToArray());

Console.WriteLine(jsonPayload);

Expected behavior Expected the following output:

{
  "query": {
    "bool": {
      "should": {
        "terms": {
          "abcd": ["a", "b"]
        }
      },
      "minimum_should_match": 1
    }
  },
  "track_total_hits": false
}

but this is the actual produces JSON:

{
  "query": {
    "bool": {
      "should": {
        "terms": {
          "abcd": {}
        }
      },
      "minimum_should_match": {}
    }
  },
  "track_total_hits": {}
}

Other notes: Using new TrackHits(1) for TrackTotalHits gives the same output.

We've worked around it for now with custom serializers, just reporting it here to make sure you're aware of it.

Sidenote, for TermsQuery, shouldn't Term be renamed to Terms to align semantically since the use case is for one or many terms? Also, thinking about DX, TermsQuery.Term(s) should allow us to just pass in a collection in stead of a TermsQueryField which feels a lot more clunky.

flobernd commented 1 month ago

Hi @andersosthus,

you are trying to serialize "internal" request/response types using the SourceSerializer. The SourceSerializer is exclusively meant to (de-)serialize user defined data (usually specified as a generic type TDocument).

It should work, if you use the client.RequestResponseSerializer instead.

Hint: If you update to the latest version, you can use one of the new serialization overloads:

var jsonPayload = client.RequestResponseSerializer.SerializeToString(request);

This way you don't have to manually create a MemoryStream or handle encoding. Besides that, an optimized shortcut path will be used, if the serializer derives from SystemTextJsonSerializer - which is the case for both, the DefaultRequestResponseSerializer and the DefaultSourceSerializer 🙂

Please let me know, if that answers your question.

flobernd commented 1 month ago

Sidenote, for TermsQuery, shouldn't Term be renamed to Terms to align semantically since the use case is for one or many terms?

Yes, I agree. This is currently incorrectly named in our specification. The specification is used to generate multiple clients and other tools, which means we have to be careful with breaking changes like this one. We will discuss this point internally.

Also, thinking about DX, TermsQuery.Term(s) should allow us to just pass in a collection in stead of a TermsQueryField which feels a lot more clunky.

This is already on the list of upcoming usability improvements. For this to work, we have to generate transient implicit conversion operators.

andersosthus commented 1 month ago

Hi @andersosthus,

you are trying to serialize "internal" request/response types using the SourceSerializer. The SourceSerializer is exclusively meant to (de-)serialize user defined data (usually specified as a generic type TDocument).

It should work, if you use the client.RequestResponseSerializer instead.

Hint: If you update to the latest version, you can use one of the new serialization overloads:

var jsonPayload = client.RequestResponseSerializer.SerializeToString(request);

This way you don't have to manually create a MemoryStream or handle encoding. Besides that, an optimized shortcut path will be used, if the serializer derives from SystemTextJsonSerializer - which is the case for both, the DefaultRequestResponseSerializer and the DefaultSourceSerializer 🙂

Please let me know, if that answers your question.

Thanks @flobernd.

Too many serializers :)

I guess I need to dig a bit deeper to find what's breaking our request. A bunch of assumptions led me to believe it was in the serializer (based on my example above), but that was clearly wrong :)

flobernd commented 1 month ago

@andersosthus Let me know if I can do something else to help you debugging your issue.