elastic / elasticsearch-net

This strongly-typed, client library enables working with Elasticsearch. It is the official client maintained and supported by Elastic.
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/index.html
Apache License 2.0
13 stars 1.15k forks source link

Default index is not used in query with 8.13.15 client #8215

Open krasheninnik opened 5 months ago

krasheninnik commented 5 months ago

Elastic.Clients.Elasticsearch version: 8.13.15

Elasticsearch version: 8.13.2

.NET runtime version: .NET 6.0.28 and .NET Framework 4.8.9232.0

Operating system version: Windows 10 Pro

Description of the problem including expected versus actual behavior:

I set the DefaultIndex to "features" while configuring ElasticsearchClientSettings, but the request was sent without using this DefaultIndex.

searchResult.DebugInformation: Valid Elasticsearch response built from a successful (200) low-level call on POST: /_search?typed_keys=true.

similar to https://github.com/elastic/elasticsearch-net/issues/8151

Steps to reproduce:

var connectionSettings = new ElasticsearchClientSettings(settings.ReadonlyHost);
connectionSettings.DefaultIndex("features");
connectionSettings.RequestTimeout(TimeSpan.FromMinutes(1));
connectionSettings.EnableHttpCompression();
connectionSettings.Authentication(new BasicAuthentication(settings.Login, settings.Password));

var client = new ElasticsearchClient(connectionSettings);

var searchResult = await client.SearchAsync<JsonElement>(s => s
            .Aggregations(aggs => aggs.Add("ClassName", agg => agg.Terms(dh => dh.Field("ClassId").Size(100)))));

// or
var searchRequest = new SearchRequest()
        {
            Size = 10,
            Query = new TermQuery(new Field("ClassId")) { Value = FieldValue.Long(75) }
        };

var searchResult2 = await client.SearchAsync<JsonElement>(searchRequest); 

Expected behavior I expect http request with index POST: features/_search?typed_keys=true

flobernd commented 5 months ago

Hi @krasheninnik, this is a similar case as described in #8184

Currently this is the expected behavior as stated here.

However, it seems like this is not what users intuitively expect when setting a DefaultIndex or DefaultMapping. I will do some research on how we could achieve the best developer experience in this case, before I decide how the index inferrence should work in the future.

jonyadamit commented 3 months ago

This may well be true for DefaultIndex, but if the user specifies a default index for a specific type, and calls the API with that specific type as the generic argument, then I believe the SDK must use that default index..

flobernd commented 3 months ago

@jonyadamit I agree, that the current behavior might not be optimal and, like stated above, I'm open to improve the default 🙂 I just have not decided how exactly the best default behavior would look like.

Even when the user specifies an index mapping for a specific type, let's say LogEntry => "default_log_index", this creates potentially unwanted behavior in APIs that optionally accept multiple indices (Indices parameter). For example, let's say we use the Search<LogEntry>() method.

The default behavior would be to return all LogEntry items in all indices. There might be rotating log indices like e.g. log_07_2024, log_08_2024, etc. and all of these indices would be searched.

Setting the default index/mapping for LogEntry changes semantics so that only default_log_index would be searched. To explicitly revert the behavior, the user must call .Index(Indices.All) or set Indices to null. In my opinion this is not intuitive.

It's totally fine for APIs that only accept a single index (IndexName parameter), because all of these ones actually require an index to be set. Not setting it will cause an error.

jonyadamit commented 3 months ago

I see what you mean. I am migrating from version 7 to version 8, and the default behavior failed miserably because it tried to deserialize a different type. So I guess the current default behavior doesn't return all LogEntry items in all indices, but rather all items everywhere. This is regarding a simple query such as await client.SearchAsync<LogEntry>(s => s.Query(new MatchAllQuery())).ConfigureAwait(false);. This definitely doesn't seem right. I suppose we can let the user decide the default behavior with another setting. At the minimum, it would be helpful to update the migration guide.

flobernd commented 3 months ago

@jonyadamit

So I guess the current default behavior doesn't return all LogEntry items in all indices, but rather all items everywhere.

That's correct. By default, Elasticsearch will return all matching documents (literally all documents in your match_all example). Elasticsearch used to have a _type meta-data that was used to discriminate different types. This no longer exists out of the box, but I see a lot of users adding custom type fields to their documents. A real-world query would contain at least one condition to check for the correct type term (e.g. log_entry).

As I said, I agree that the current behavior is not ideal either and I'm definitely open to improve it.

jonyadamit commented 3 months ago

Come to think of it, in my opinion at least, If the user specifies a default index, they expect it to be used even with APIs that optionally accept indices. If the user has multiple indices and plan to use them, they will need to specify them anyhow, using wildcards or aliases.. and if they truly want to search all indices then it is very legitimate to specify .Index(Indices.All) or not to set a default index for the type. Just my opinion.