SDKits / ExamineX

Issue tracker for ExamineX
https://examinex.online
5 stars 0 forks source link

Range queries are not added to the $filter expression #87

Closed chrden closed 1 year ago

chrden commented 1 year ago

Packages & versions ExamineX.AzureSearch - 4.1.4 ExamineX.AzureSearch.Umbraco - 4.1.4 ExamineX.AzureSearch.Umbraco.BlobMedia - 4.1.4

(Not using 5.0.0 yet as we haven't upgraded to Umbraco 12)

Bug summary When using the same Lucene Query with Examine and ExamineX, I am receiving different results. With Examine I get 3 results, with ExamineX I get zero results.

I have found that removing the RangeQuery when running with ExamineX returns the correct number of results. I believe the range query is being kept in the Lucene query and not being added to the $filter expression as per the ExamineX documentation and the Azure documentation

Specifics Below I have included my Lucene query when running with both Examine and Examine X. The notable difference is the syntax for the homeId range query

Lucene query - Examine {+(+__IndexType:content) -templateID:0 -__NodeTypeAlias:searchpage -__NodeTypeAlias:error404page -__NodeTypeAlias:robotstxt -__NodeTypeAlias:xmlsitemappage -__NodeTypeAlias:sitemappage +(homeId:[1113 TO 1113]) +(__NodeTypeAlias:articleresourcedetail __NodeTypeAlias:multilingualresourcedetail __NodeTypeAlias:personalstoryresourcedetail __NodeTypeAlias:podcastresourcedetail __NodeTypeAlias:researchreportresourcedetail __NodeTypeAlias:webcastresourcedetail)}

Lucene query - ExamineX +(+x__IndexType:content) -templateID:0 -x__NodeTypeAlias:searchPage -x__NodeTypeAlias:error404Page -x__NodeTypeAlias:robotstxt -x__NodeTypeAlias:xmlSitemapPage -x__NodeTypeAlias:sitemapPage +(homeId ge 1113 and homeId le 1113) +(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail)

The code that generates this query is the same when using Examine or ExamineX and is as below:

var index = _examineIndexingService.GetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName);

var query = index.Searcher
        .CreateQuery()
        .NativeQuery($"+{UmbracoExamineFieldNames.CategoryFieldName}:{IndexTypes.Content}");

if (!parameters.AllowRepositoryItems)
{
    query = query.And().GroupedNot(new[] { Constants.ExamineConstants.Fields.Umbraco.TemplateId }, "0");
}

var ignoredDoctypeAliases = new[]
{
    Core.Models.Umbraco.SearchPage.ModelTypeAlias,
    Core.Models.Umbraco.Error404Page.ModelTypeAlias,
    Core.Models.Umbraco.Robotstxt.ModelTypeAlias,
    Core.Models.Umbraco.XmlSitemapPage.ModelTypeAlias,
    Core.Models.Umbraco.SitemapPage.ModelTypeAlias
};

query.And().GroupedNot(new[] { UmbracoExamineFieldNames.ItemTypeFieldName }, ignoredDoctypeAliases);

if (parameters.HomeId > 0)
{
    query.And().RangeQuery<int>(new[] { Constants.ExamineConstants.Fields.HomeId }, parameters.HomeId, parameters.HomeId);
}

if (parameters.DoctypeAliases?.Any() == true)
{
    query.And().GroupedOr(new[] { UmbracoExamineFieldNames.ItemTypeFieldName }, parameters.DoctypeAliases.Select(x => x.ToString()).ToArray());
}

var skip = _examineService.CalculateSkip(parameters.Page, parameters.PageSize, parameters.PageSize);

var results = query.Execute(new QueryOptions((int)skip, (int)parameters.PageSize));

totalItemCount = results.TotalItemCount;

return results;

In case it helps, I have also included examples of two JSON queries that I have used to test this theory in the 'Search Explorer' in Azure Cognitive Search in the Azure Portal.

This example has the range query in the $filter expression and successfully returns the expected 3 results.

{
  "queryType": "full",
  "search": "+(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail)",
  "filter": "(homeId ge 1113 and homeId le 1113)",
  "select": "nodeName,homeId,x__NodeTypeAlias",
  "count": true
}

This example has the range query included in the search parameter and returns zero results

{
  "queryType": "full",
  "search": "+x__IndexType:content -templateID:0 -x__NodeTypeAlias:searchPage -x__NodeTypeAlias:error404Page -x__NodeTypeAlias:robotstxt -x__NodeTypeAlias:xmlSitemapPage -x__NodeTypeAlias:sitemapPage +(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail) +(homeId ge 1113 and homeId le 1113)",
  "count": true
}

Hopefully this is enough information to help you diagnose the issue but please let me know if you need anything else from me

Thanks

AndrewBarta-SC commented 1 year ago

Hey Chris, thanks for the info. What is the exact c# syntax you are using for the query?

Range queries are unique to Azure search since Lucene syntax doesn't work directly.

chrden commented 1 year ago

@AndrewBarta-SC Hi, sorry for the late reply!

I have shown the C# syntax above for how I am creating and executing the query, including the RangeQuery. You can see this in this section:

if (parameters.HomeId > 0)
{
    query.And().RangeQuery<int>(new[] { Constants.ExamineConstants.Fields.HomeId }, parameters.HomeId, parameters.HomeId);
}

I have also included the Lucene query output.

Sorry if I have misunderstood, please feel free to ask me anything else as well

Thanks

AndrewBarta-SC commented 1 year ago

Hey @chrden now I need to apologize for the delay!

Thank you for the callout where you are executing the RangeQuery. I'm talking with Shannon later today and aiming to dive into this further.

Will get back to you soon, thanks for your patience!

Shazwazza commented 1 year ago

Hi @chrden,

Behind the scenes ExamineX translates range queries into a filter expression. As you've noted, Lucene and Azure Search have different requirements for range queries but ExamineX takes care of this translation for you.

One thing to note in your query is your use of GroupedNot/Not. This is a nuance of how Lucene works but a "Not" query is actual a post filter. All of your "Not" queries must come last else this can lead to potentially unintended results.

There are several tests for range queries in ExamineX but I've just done some further tests with regards to combining range queries with other queries such as NativeQuery and have detected a bug.

We'll get this fixed asap and get a new patch release out. Will ping back here when the fix is available.

Shazwazza commented 1 year ago

Hi @chrden Version 4.1.5 has been pushed to Nuget with a fix for this. If you can please let us know if this resolves your issue it would be much appreciated.

chrden commented 1 year ago

Hi @Shazwazza Thanks for getting back to me and putting together a patch so quickly, however unfortunately this does not appear to have resolved the issue.

As suggested, I have upgraded to v4.1.5 and moved all GroupedNot/Not queries to be added just before the Execute call. With these two adjustments, I am still seeing the same results as before.

Tests

Below are the Lucene queries produced for some tests I carried out with an explanation of the results. The only difference between the Examine tests and the ExamineX tests is that for Examine I have the following setting ExamineX:Disabled set to true in appsettings.json, this line is commented out for the ExamineX tests

Examine

Examine - Benchmark test Correct results returned: 3 results with the appropriate 'NodeTypeAlias' values

Category: null, LuceneQuery: {+(+__IndexType:content) +(homeId:[1113 TO 1113]) +(__NodeTypeAlias:articleresourcedetail __NodeTypeAlias:multilingualresourcedetail __NodeTypeAlias:personalstoryresourcedetail __NodeTypeAlias:podcastresourcedetail __NodeTypeAlias:researchreportresourcedetail __NodeTypeAlias:webcastresourcedetail) -templateID:0 -__NodeTypeAlias:searchpage -__NodeTypeAlias:error404page -__NodeTypeAlias:robotstxt -__NodeTypeAlias:xmlsitemappage -__NodeTypeAlias:sitemappage}

Examine - no HomeId query Correct results returned: 3 results with the appropriate 'NodeTypeAlias' values

Category: null, LuceneQuery: {+(+__IndexType:content) +(__NodeTypeAlias:articleresourcedetail __NodeTypeAlias:multilingualresourcedetail __NodeTypeAlias:personalstoryresourcedetail __NodeTypeAlias:podcastresourcedetail __NodeTypeAlias:researchreportresourcedetail __NodeTypeAlias:webcastresourcedetail) -templateID:0 -__NodeTypeAlias:searchpage -__NodeTypeAlias:error404page -__NodeTypeAlias:robotstxt -__NodeTypeAlias:xmlsitemappage -__NodeTypeAlias:sitemappage}

Examine - no HomeId query and with Diagnosis tag filter Correct results returned: 1 result returned with the appropriate value of 1159 in the DiagnosisTagIds field

Category: null, LuceneQuery: {+(+__IndexType:content) +(__NodeTypeAlias:articleresourcedetail __NodeTypeAlias:multilingualresourcedetail __NodeTypeAlias:personalstoryresourcedetail __NodeTypeAlias:podcastresourcedetail __NodeTypeAlias:researchreportresourcedetail __NodeTypeAlias:webcastresourcedetail) +(DiagnosisTagIds:lucenematch1159lucenematch) -templateID:0 -__NodeTypeAlias:searchpage -__NodeTypeAlias:error404page -__NodeTypeAlias:robotstxt -__NodeTypeAlias:xmlsitemappage -__NodeTypeAlias:sitemappage}

ExamineX

ExamineX - Benchmark test Incorrect results returned: 'NodeTypeAlias' filters (include and exclude) ignored and all nodes returned

{{ Category: , LuceneQuery: +(+x__IndexType:content) +(homeId ge 1113 and homeId le 1113) +(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail) -templateID:0 -x__NodeTypeAlias:searchPage -x__NodeTypeAlias:error404Page -x__NodeTypeAlias:robotstxt -x__NodeTypeAlias:xmlSitemapPage -x__NodeTypeAlias:sitemapPage }}

ExamineX - No HomeId query Correct results returned: 3 results with the appropriate 'NodeTypeAlias' values

{{ Category: , LuceneQuery: +(+x__IndexType:content) +(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail) -templateID:0 -x__NodeTypeAlias:searchPage -x__NodeTypeAlias:error404Page -x__NodeTypeAlias:robotstxt -x__NodeTypeAlias:xmlSitemapPage -x__NodeTypeAlias:sitemapPage }}

ExamineX - No HomeId query and with Diagnosis tag filter Incorrect results returned: No results returned

{{ Category: , LuceneQuery: +(+x__IndexType:content) +(x__NodeTypeAlias:articleResourceDetail x__NodeTypeAlias:multilingualResourceDetail x__NodeTypeAlias:personalStoryResourceDetail x__NodeTypeAlias:podcastResourceDetail x__NodeTypeAlias:researchReportResourceDetail x__NodeTypeAlias:webcastResourceDetail) +(DiagnosisTagIds:lucenematch1159lucenematch) -templateID:0 -x__NodeTypeAlias:searchPage -x__NodeTypeAlias:error404Page -x__NodeTypeAlias:robotstxt -x__NodeTypeAlias:xmlSitemapPage -x__NodeTypeAlias:sitemapPage }}

I have also come across an additional issue. This may not be related so please feel free to ask me to create a new issue if you feel this is more appropriate.

Further information

When carrying out the above tests, I decided to rebuild the External index to see if the index was out of date and that was causing some of the issues I was seeing. When I do this, the index gets to a DocumentCount of 21/22 and then halts. Each time this happens, two log entries appear in the Event Log in Umbraco. Both produce the same error and StackTrace (below), except one has a SourceContext of ExamineX.AzureSearch.AzureSearchIndex and the other has a SourceContext of Examine.BaseIndexProvider :

Azure.RequestFailedException: The request is invalid. Details: parameters : An unexpected 'StartArray' node was found when reading from the JSON reader. A 'PrimitiveValue' node was expected.
Status: 400 (Bad Request)

Content:
{"error":{"code":"","message":"The request is invalid. Details: parameters : An unexpected 'StartArray' node was found when reading from the JSON reader. A 'PrimitiveValue' node was expected."}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
client-request-id: a9f316db-e014-4f9d-a9e7-307e1439e421
x-ms-client-request-id: a9f316db-e014-4f9d-a9e7-307e1439e421
request-id: a9f316db-e014-4f9d-a9e7-307e1439e421
elapsed-time: 496
Preference-Applied: REDACTED
Strict-Transport-Security: REDACTED
Date: Wed, 16 Aug 2023 08:42:28 GMT
Content-Type: application/json; charset=utf-8
Content-Language: REDACTED
Expires: -1
Content-Length: 194

   at Azure.Search.Documents.SearchClient.IndexDocumentsInternal[T](IndexDocumentsBatch`1 batch, IndexDocumentsOptions options, Boolean async, CancellationToken cancellationToken)
   at Azure.Core.Pipeline.TaskExtensions.EnsureCompleted[T](Task`1 task)
   at Azure.Search.Documents.SearchClient.IndexDocuments[T](IndexDocumentsBatch`1 batch, IndexDocumentsOptions options, CancellationToken cancellationToken)
   at ExamineX.AzureSearch.AzureSearchIndex.A[A,a](IEnumerable`1, Func`2, Func`3, CancellationToken, String callerMember)

Apologies for the bad news, however please let me know if there is any other info you would like from me to make investigating this issue easier. I am also happy to have a 1-2-1 call to demo our setup if you believe this may be quicker.

Thanks

Shazwazza commented 1 year ago

Hi, thanks for the info.

I've been able to identify another issue which was due to how queries get formatted when there are range queries. That combined with the negation wasn't working. When ExamineX determines there is a range query, it needs to reformat the entire Lucene query into an OData query since that is the only way you can do full text searches with ranges in Azure Search.

I've also added Debug logging output for the class ExamineX.AzureSearch.AzureSearchResults which will output the OData query when there is a range query present. So you can update your serilog config to allow Debug level logging for that object.

I've published a 4.1.6 version which you can try: https://www.nuget.org/packages/ExamineX.AzureSearch.Umbraco/4.1.6

As for your other error, I can't replicate it so is hard to diagnose. Perhaps it is this issue mentioned here https://stackoverflow.com/questions/41350961/error-while-adding-new-documents-to-azure-search-index ? Where maybe you have a field defined as a single string but there are multiple values being passed up to it?

chrden commented 1 year ago

Hi @Shazwazza

We have an interesting development...

I have managed to resolve the other error I was getting. Thanks for sending that SO post as this helped me to figure out where the issue was. It was because we were setting multiple values against the tag index fields and setting the FieldDefinitionType to be FullTextSortable rather than the default FullTextMultiValue. This appears to resolve the error whilst indexing.

However, unfortunately, I am now getting an exception thrown (see below) regarding $filter. This is happening with v4.1.5 and v4.1.6 and only happens when I have the homeId range query in place, once I comment out that line (see original post) the query succeeds and the results returned are correct (which is progress at least πŸ™Œ)

The interesting this is, when I revert back to v4.1.4, the below exception does not get thrown and the results are correct.

I have to apologise because it seems as though my original issue (when I was using v4.1.4) may have been caused by the indexing error which I have now resolved.

To summarise, the only differences between my original post and now are:

Hopefully the above information and below exception provides you with enough information to be able to replicate the issue.

Please let me know if there is anything else you would like from me.

Thanks

RequestFailedException: Invalid expression: The operand for a binary operator 'GreaterThanOrEqual' is not a single value. Binary operators require both operands to be single values.
Parameter name: $filter
Status: 400 (Bad Request)

Content:
{"error":{"code":"","message":"Invalid expression: The operand for a binary operator 'GreaterThanOrEqual' is not a single value. Binary operators require both operands to be single values.\r\nParameter name: $filter"}}

Headers:
Cache-Control: no-cache
Pragma: no-cache
client-request-id: fda31c85-4de7-4b82-9412-3d0d3d5c5030
x-ms-client-request-id: fda31c85-4de7-4b82-9412-3d0d3d5c5030
request-id: fda31c85-4de7-4b82-9412-3d0d3d5c5030
elapsed-time: 65
Preference-Applied: REDACTED
Strict-Transport-Security: REDACTED
Date: Fri, 18 Aug 2023 10:22:37 GMT
Content-Type: application/json; charset=utf-8
Content-Language: REDACTED
Expires: -1
Content-Length: 218
Shazwazza commented 1 year ago

Hi @chrden

That is an interesting error. I'm wondering if that has to do with the field type that you are trying to do a range query on. I found this SO link here regarding that error https://stackoverflow.com/a/41684246

Perhaps you are trying a range query on a DataType.Collection(DataType.String) type of field?

What is your field type for your "Constants.ExamineConstants.Fields.HomeId" field?

I've also added Debug logging output for the class ExamineX.AzureSearch.AzureSearchResults which will output the OData query when there is a range query present. So you can update your serilog config to allow Debug level logging for that object.

Did you add debug level logging for this? If so, you can see in the logs the converted OData query it is trying to execute.

chrden commented 1 year ago

Hi @Shazwazza,

Apologies for the slightly later than usual reply.

Field type

What we have is the below for setting the FieldDefinitionTypes where:

public class ConfigureIndexOptions : IConfigureNamedOptions<LuceneDirectoryIndexOptions>
{
    public void Configure(string? name, LuceneDirectoryIndexOptions options)
    {
        switch(name)
        {
            case UmbracoIndexes.ExternalIndexName:
                options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.HomeId, AzureSearchFieldDefinitionTypes.Integer));
                options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.UpdateDate.UpdateDateTime, AzureSearchFieldDefinitionTypes.DateTime));
                break;

            case UmbracoIndexes.InternalIndexName:
                break;

            case UmbracoIndexes.MembersIndexName:
                break;
        }
    }

    public void Configure(LuceneDirectoryIndexOptions options)
        => Configure(string.Empty, options);
}

I have tried the above with ExamineX.AzureSearch.AzureSearchFieldDefinitionTypes and Examine.FieldDefinitionTypes.

Regardless of which I use, the data in the index (in Azure Search Service) for these two fields is stored as follows (after a reindex):

"homeId": [
    "1113"
],
"updateDateTime": "2023-09-24T12:00:00Z"

Looking at the value of homeId, I believe you could be correct in saying that the value is being stored as DataType.Collection(DataType.String) and that could be causing the issue. However, as I said previously, this only happens in version 4.1.5 and 4.1.6. Also, looking at other int type values in the index, such as id or parentID, these are stored in the same way in the index.

I am not sure if I am configuring these custom fields incorrectly, however maybe you can suggest something I may also be able to try.

Debugging

Regarding Serilog and debugging, I have added the above line to my appsettings.json (with no override in appsettings.Development.json) so that my Serilog settings now look as follows:

"Serilog": {
    "MinimumLevel": {
        "Default": "Information",
        "Override": {
            "Microsoft": "Warning",
            "Microsoft.Hosting.Lifetime": "Information",
            "System": "Warning",
            "SixLabors.ImageSharp.Web.Middleware.ImageSharpMiddleware": "Warning",
            "Umbraco.Cms.Infrastructure.ModelsBuilder.AutoModelsNotificationHandler": "Warning",
            "Umbraco.Cms.Core.Services.Implement.ContentService": "Warning",
            "Umbraco.Cms.Web.BackOffice.Controllers.AuthenticationController": "Warning",
            "Umbraco.Cms.Infrastructure.Examine.ExamineUmbracoIndexingHandler": "Warning",
            "uSync.BackOffice.Controllers.uSyncDashboardApiController.ImportHandler": "Warning",
            "uSync.BackOffice.Notifications.uSyncApplicationStartingHandler": "Warning",
            "Umbraco.Cms.Core.Runtime.MainDom": "Warning",
            "Umbraco.Cms.Core.Logging.ProfilingLogger": "Warning",
            "Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService": "Warning",
            "Umbraco.Cms.Infrastructure.HostedServices.ContentVersionCleanup": "Warning",
            "ExamineX.AzureSearch.AzureSearchResults":  "Debug"
        }
    }
},

Unfortunately I am not seeing any Debug entries in my event log.

Is it possible the exception is occurring before the log has had chance to be written?


Please let me know if you need any more information.

Thanks

Shazwazza commented 1 year ago

In your code above, you are configuring LuceneDirectoryIndexOptions. These options aren't part of ExamineX, these are part of Examine and are only for Lucene indexes, not azure search indexes.

Here's the docs on configuring ExamineX/Azure Search indexes: https://examinex.online/v3/examples#default-field-types--analyzers

Shazwazza commented 1 year ago

ExamineX does try to automatically map the normal Examine config over to ExamineX config behind the scenes. I will investigate if there is any issues with that, but otherwise, you can explicitly define the ExamineX fields using the AzureSearchIndexOptions instead as per the docs mentioned.

Shazwazza commented 1 year ago

Also, how/where are you registering your ConfigureIndexOptions class? It might be that singleton registrations are being overwritten in the DI container by different registrations.

Shazwazza commented 1 year ago

When ExamineX tries to auto-map the LuceneDirectoryIndexOptions, it does so based on the ExamineXComposer. Composers are executed in the order they are registered and that order is determine by Umbraco itself when it goes to discover IComposers.

The main reason ExamineX auto-maps registered LuceneDirectoryIndexOptions to AzureSearchIndexOptions is because Umbraco itself configures its 3 managed indexes and configures field types for them. This composition is done at an Umbraco core level, so we know the ExamineX composition will execute after Umbraco's own core composition.

So what I believe is happening in your case, is that you are configuring Lucene options, but this configuration is done after the ExamineXComposer's mapping, so your definitions get ignored.

To resolve this, I would recommend just using the documented way for configuring ExamineX fields https://examinex.online/v3/examples#default-field-types--analyzers.

Alternatively, if you need/want to continue only configuring LuceneDirectoryIndexOptions, then you would need to ensure that your ConfigureIndexOptions is registered in DI before the ExamineXComposer executes. This could be done by registering your class before the call to services.AddUmbraco(...)

chrden commented 1 year ago

Hi @Shazwazza

Thanks for the help.

I have taken your advice and used the method outlined in the ExamineX documentation. I now have the below implementation, however I am getting the same outcome as my last comment. I am still seeing this data for homeId and updateDateTime fields in the index:

"homeId": [
    "1113"
],
"updateDateTime": "2023-09-24T12:00:00Z"

This is what ConfigureServices in my Startup.cs looks like:

public void ConfigureServices(IServiceCollection services)
{
    var config = services
        .AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .AddCustomAppsettingSections()
        .AddCustomServices()
        .AddCustomConfigureOptions()
        .AddCustomComponents()
        .AddCustomNotificationHandlers()
        .AddAzureBlobMediaFileSystem();

    services.AddCustomUmbracoClientSideRoutes();

    services.AddCacheBusting();

    config.Build();
}

And this is the 'AddCustomConfigureOptions' extension method:

public static IUmbracoBuilder AddCustomConfigureOptions(this IUmbracoBuilder builder)
{
    builder.Services.Configure<AzureSearchIndexOptions>(UmbracoIndexes.ExternalIndexName, 
        options =>
        {
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.HomeId, AzureSearchFieldDefinitionTypes.Integer));
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.UpdateDate.UpdateDateTime, AzureSearchFieldDefinitionTypes.DateTime));
        }
    );

    return builder;
}

I am still getting the same error as previously reported and I have also tried upgrading to 4.1.7 to see if that would resolve the issue:

RequestFailedException: Invalid expression: The operand for a binary operator 'GreaterThanOrEqual' is not a single value. Binary operators require both operands to be single values.
Parameter name: $filter
Status: 400 (Bad Request)

Content:
{"error":{"code":"","message":"Invalid expression: The operand for a binary operator 'GreaterThanOrEqual' is not a single value. Binary operators require both operands to be single values.\r\nParameter name: $filter"}}
Shazwazza commented 1 year ago

Hi, are you able to try this query with a basic range query with hard coded values and see how that goes? I am failing to replicate this on my end.

chrden commented 1 year ago

@Shazwazza I have tried to strip it right back as you suggest and here are my findings

Search method This remains the same throughout all of the test cases below.

public IEnumerable<int> Search(SearchParameters parameters, out long totalItemCount)
{
    var skip = _examineService.CalculateSkip(parameters.Page, parameters.PageSize, parameters.PageSize);

    var query = BuildQuery(parameters);

    var results = query.Execute(new QueryOptions((int)skip, (int)parameters.PageSize));

    totalItemCount = results.TotalItemCount; // <<<<< EXCEPTION THROWN HERE

    return results.Select(x => x.Id).Select(int.Parse);
}

Test cases

I am currently on v4.1.7 I have tested the below amendments to the BuildQuery method with both Examine and ExamineX. To ensure this was done correctly

Test case 1

BuildQuery
private IBooleanOperation BuildQuery(SearchParameters parameters)
{
    var index = _examineIndexingService.GetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName);

    var query = index.Searcher
            .CreateQuery()
            .NativeQuery($"+{UmbracoExamineFieldNames.CategoryFieldName}:{IndexTypes.Content}");

    // 'homeId' field name constant and value pulled from 'parameters'
    query.And().RangeQuery<int>(new[] { Constants.ExamineConstants.Fields.HomeId }, parameters.HomeId, parameters.HomeId); 

    return query;
}
Query output
Examine

{+(+__IndexType:content) +(homeId:[1113 TO 1113])}

ExamineX

{+(+x__IndexType:content) +(homeId ge 1113 and homeId le 1113)}


Test case 2

BuildQuery
private IBooleanOperation BuildQuery(SearchParameters parameters)
{
    var index = _examineIndexingService.GetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName);

    var query = index.Searcher
            .CreateQuery()
            .NativeQuery($"+{UmbracoExamineFieldNames.CategoryFieldName}:{IndexTypes.Content}");

    // 'homeId' field name and values hardcoded
    query.And().RangeQuery<int>(new[] { "homeId" }, 1113, 1113); 

    return query;
}
Query output
Examine

{+(+__IndexType:content) +(homeId:[1113 TO 1113])}

ExamineX

{+(+x__IndexType:content) +(homeId ge 1113 and homeId le 1113)}


Test case 3

BuildQuery
private IBooleanOperation BuildQuery(SearchParameters parameters)
{
    var index = _examineIndexingService.GetIndex(Umbraco.Cms.Core.Constants.UmbracoIndexes.ExternalIndexName);

    // 'homeId' field name and values hardcoded in native query
    var query = index.Searcher
            .CreateQuery()
            .NativeQuery($"+{UmbracoExamineFieldNames.CategoryFieldName}:{IndexTypes.Content}")
            .And()
            .NativeQuery("homeId:[1113 TO 1113]"); 

    return query;
}
Examine

{+(+__IndexType:content) +homeId:[1113 TO 1113]}

ExamineX

{+(+x__IndexType:content) +()}

ExamineX/Azure Search Exception

Azure.RequestFailedException: 'Failed to parse query string at line 1, column 27. See https://aka.ms/azure-search-full-query for supported syntax.
Status: 400 (Bad Request)
ErrorCode: InvalidRequestParameter

Outcome

In test cases 1 and 2, the same error was produced as has been seen previously.

In this last test case, the additional NativeQuery does not get transformed. I tried multiple methods to get this to work, e.g. including the homeId range query in a single NativeQuery, but the range query is never transformed and outputs nothing. In the posted example, the +() is added and produces the above exception.

Hope this helps you to reproduce it. Let me know if you want me to try anything else

Thanks

Shazwazza commented 1 year ago

Hi,

Thanks for the info.

Yes, NativeQuery will not get translated. The purpose of this abstraction call is to be able to supply a 'native' query to the underlying implementation. Azure Search does not support Lucene based syntax for range queries, under the hood these are done as filters which is why the RangeQuery methods need to be used because this does a lot of translations behind the scenes.

I've run the below locally within a view. Anytime you refresh, it will create a new document with some random data and integer numbers.

This is my Startup.cs code to configure fields:

    services.Configure<AzureSearchIndexOptions>("InternalIndex", options =>
    {
        options.FieldDefinitions.AddOrUpdate(new FieldDefinition("myStringField", AzureSearchFieldDefinitionTypes.FullText));
        options.FieldDefinitions.AddOrUpdate(new FieldDefinition("myIntField", AzureSearchFieldDefinitionTypes.Integer));
    });

This is the logic to index a new item and then search on both a Native query and a Range query. I've structured it in a way that mimics what you are doing as well.


    if (!ExamineManager.TryGetIndex("InternalIndex", out var index)) {
        throw new InvalidOperationException("No index found");
    }

    var random = new Random();

    index.IndexItems(new[] { ValueSet.FromObject(Guid.NewGuid().ToString(), "content", "content",
                        new {
                            nodeName = $"my name {Guid.NewGuid().ToString()}",
                            bodyText = "lorem ficus",
                            myStringField = "header 1",
                            myIntField = random.Next(10, 100),
                            path = "-1,123,456",
                            hideMe = "1" }) });

    var query = index.Searcher
        .CreateQuery()
        .NativeQuery($"+{UmbracoExamineFieldNames.CategoryFieldName}:{IndexTypes.Content}");

    var query2 = query.And()
        .RangeQuery<int>(new[] { "myIntField" }, 20, 40);

    var results = query2.Execute();

This results in no errors and will eventually show some results depending on if those random integers fall within that range.

Are you able to make that work?

chrden commented 1 year ago

Hi @Shazwazza

We have good news! πŸŽ‰ And some confusing news πŸ€”

I managed to get the code you sent through working. So that obviously meant it was something else in my code or index that was the issue.

I deleted the index and give it a rebuild and found that the homeId value was successfully stored as an int rather than an array of string as previously posted. I think I must not have rebuilt the index after I made a change to AddCustomConfigureOptions on 30th August (here: https://github.com/SDKits/ExamineX/issues/87#issuecomment-1699103502). This then meant the code was working, no more exceptions and my search results we correct πŸ™Œ

However...that got me thinking, I wondered if the way that the AzureSearchIndexOptions was being configured was making the difference and wanted to test this before committing my work. This is where things get confusing.

In the process of trying to determine if rebuilding the index was the only thing I needed and running more tests I managed to revert the homeId value back to an array of string when trying to configure the index options in the way that the Umbraco docs state (https://docs.umbraco.com/umbraco-cms/reference/searching/examine/indexing) by using a composer, etc. I now cannot seem to get the value to be stored as an int again, even after reverting all code changes, deleting and rebuilding the index πŸ€”

The code I have now that I've run multiple times looks like this. I believe this follows the ExamineX documentation.

Startup.cs

public void ConfigureServices(IServiceCollection services)
{
    var config = services
        .AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .AddCustomAppsettingSections()
        .AddCustomServices()
        .AddCustomConfigureOptions()
        .AddCustomComponents() // <--- Here is where the value for 'homeId' is being set
        .AddCustomNotificationHandlers()
        .AddAzureBlobMediaFileSystem();

    services.AddCustomUmbracoClientSideRoutes();

    services.AddCacheBusting();

    config.Build();
}
public static IUmbracoBuilder AddCustomConfigureOptions(this IUmbracoBuilder builder)
{
    //builder.Services.ConfigureOptions<ConfigureIndexOptions>();
    builder.Services.Configure<AzureSearchIndexOptions>(UmbracoIndexes.ExternalIndexName, 
        options =>
        {
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.HomeId, AzureSearchFieldDefinitionTypes.Integer));
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.UpdateDate.UpdateDateTime, AzureSearchFieldDefinitionTypes.DateTime));
        }
    );

    return builder;
}

public static IUmbracoBuilder AddCustomComponents(this IUmbracoBuilder builder)
{
    builder.Components().Append<ExternalIndexComponent>();
    builder.Components().Append<InternalIndexComponent>();

    return builder;
}
public class ExternalIndexComponent : IComponent
{
    private readonly IExamineIndexingService _examineIndexingService;
    private readonly ILogger<ExternalIndexComponent> _logger;

    public ExternalIndexComponent(IExamineIndexingService examineIndexingService, ILogger<ExternalIndexComponent> logger)
    {
        _examineIndexingService = examineIndexingService;
        _logger = logger;
    }

    public void Initialize()
    {
        var index = _examineIndexingService.GetIndex(UmbracoIndexes.ExternalIndexName);

        if (index is not BaseIndexProvider indexProvider)
        {
            throw new InvalidOperationException("Could not cast index");
        }

        indexProvider.TransformingIndexValues += TransformingIndexValues;
    }

    private void TransformingIndexValues(object? sender, IndexingItemEventArgs e)
    {
        try
        {
            if (e.ValueSet.Category == "content")
            {
                SetHomeNodeId(e);
                SetResourceTypeSynonyms(e);
                SetUpdateDate(e);

                _examineIndexingService.SetGeolocationProperties(e);
                _examineIndexingService.SetSearchKeywords(e);
                _examineIndexingService.SetServiceFinderCategoryNames(e);
                _examineIndexingService.SetTagNames(e);
            }
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Error transforming values for the index");
        }
    }

    private void SetHomeNodeId(IndexingItemEventArgs args)
    {
        try
        {
            var updatedValues = args.ValueSet.Values.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value);

            if (updatedValues.ContainsKey(Constants.ExamineConstants.Fields.HomeId))
            {
                updatedValues.Remove(Constants.ExamineConstants.Fields.HomeId);
            }

            if (!args.ValueSet.Values.ContainsKey("path"))
            {
                return;
            }

            var path = args.ValueSet.Values.First(x => x.Key == "path");
            var split = path.Value.First()?.ToString()?.Split(',');

            if (split?.Length <= 2)
            {
                return;
            }

            int.TryParse(split?.Skip(2).First(), out int homeId);

            if (homeId <= 0)
            {
                return;
            }

            updatedValues.Add(Constants.ExamineConstants.Fields.HomeId, new[] { (object)homeId });

            args.SetValues(updatedValues);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Error setting '{fieldName}' field for node ID: {id}", Constants.ExamineConstants.Fields.HomeId, args.ValueSet.Id);
        }
    }
        ...

With this code, I have deleted and rebuilt the index multiple times but the index looks like the following for the homeId field:

"homeId": [
    "1113"
]

Please can you see if there is anything I'm missing?

This is now driving me crazy...we were so close!

chrden commented 1 year ago

@Shazwazza Minor update. This appears to be somewhat intermittent.

Right after I posted that comment, I did another rebuild of the index and the value was correctly stored against the homeId field as an int. Another rebuild to be sure, and it went back to an array of string.

Very weird 🀷

Shazwazza commented 1 year ago

Thanks for the update @chrden. This must be due to a race condition of Umbraco's ordering of composers https://docs.umbraco.com/umbraco-cms/implementation/composing

Since you are configuring the lucene settings with a composer, you might want to try ensuring that your composer is run before ExamineX's using [ComposeBefore(typeof(ExamineXComposer))]

ExamineX's composer calls into an extension method that then configures the AzureSearchIndexOptions based on any configured LuceneDirectoryIndexOptions. Potentially, in some cases your composer is loaded before or after the ExamineX one which could mean that your configured LuceneDirectoryIndexOptions are being executed after ExamineX configures its AzureSearchIndexOptions which means that your configured changes aren't executed yet and are lost during ExamineX's wrapping of the LuceneDirectoryIndexOptions.

Let me know if that works for you :)

Alternatively, you can configure ExamineX fields by following the docs here https://examinex.online/v3/examples#change-default-analyzers and using services.Configure<AzureSearchIndexOptions> directly instead of configuring the LuceneDirectoryIndexOptions.

sandervandepas-iodigital commented 1 year ago

Hi @Shazwazza @AndrewBarta-SC

We ran into the same problem but then with a default Umbraco field called parentId. It is a StringCollection instead of an integer.

We fixed it by using the code below and set the fieldDefinitionType by ourselve:

public class ConfigureDefaultIndexOptions : IConfigureNamedOptions<LuceneDirectoryIndexOptions>
{
    public void Configure(string name, LuceneDirectoryIndexOptions options)
    {
        if (name.Equals(Cms.Core.Constants.UmbracoIndexes.ExternalIndexName) || name.Equals(Cms.Core.Constants.UmbracoIndexes.InternalIndexName))
        {
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition("parentID", FieldDefinitionTypes.Integer));
        }
    }

    public void Configure(LuceneDirectoryIndexOptions options) => {};
}
[ComposeBefore(typeof(ExamineXComposer))]
public class CultureSearchComposer : IComposer
{
    public void Compose(IUmbracoBuilder builder)
    {
        builder.Services.ConfigureOptions<ConfigureDefaultIndexOptions>();
        builder.Components().Append<CultureSearchComponent>();
    }
}

We only have a simple search query like below:

var criteria = searcher.CreateQuery(IndexTypes.Content);
 var examineQuery = criteria
                     .NodeTypeAlias(VacancyPage.ModelTypeAlias)
                     .And().ParentId(overviewPage.Id)
                     .And().Field($"__Published_{culture}", "y");

 searchResults = examineQuery.Execute(new QueryOptions(0, 2000));

It feels a bit weird that we have to add this custom code for a standard Umbraco field that needs to be an integer. How can this parentId field be a StringCollection and not an integer?

Isn't this something that needs to be fixed in ExamineX (by adding a ComposeAfter attribute so that it always runs after the Umbraco composers) instead of our custom code?

Grtz Sander

Shazwazza commented 12 months ago

The order that composers are loaded by Umbraco can be random due to the nature of its type scanning. I understand Umbraco docs say to use composers to adjust configuration for Examine indexes but since you are not shipping a package, my personal advice would be to compose your own website using standard .NET composition within the Startup.ConfigureServices method - but this is my own preference. The reason I prefer this is because then you know exactly when your code is going to execute. i.e. you can configure your index options before you configure Umbraco in the services - meaning any composers will run after your configured options.

If you are configuring Examine fields based on the LuceneDirectoryIndexOptions and you are using a composer, then you are correct in that your composer must run before the ExamineX one does.

The best that can be done is to update our documentation regarding this. There's no way that I know of that we can force the ExamineX composer to load last since ExamineX cannot know about your own composers or their types.

jbreuer commented 12 months ago

Hi @Shazwazza,

I'm working together with @sandervandepas-iodigital on the same project. Somehow the fix of Sander with ConfigureDefaultIndexOptions didn't work. We're not sure why, but we now fixed it like this in Startup.cs:

services.AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .Build();

var examineIndexes = new List<string> { "ExternalIndex", "InternalIndex" };

foreach (var examineIndex in examineIndexes)
{
    services.Configure<AzureSearchIndexOptions>(examineIndex, options =>
    {
        foreach (var fieldDefinition in UmbracoFieldDefinitionCollection.UmbracoIndexFieldDefinitions)
        {
            options.FieldDefinitions.AddOrUpdate(fieldDefinition);
        }
    });
}

This made sure that the FieldDefinitions for standard Umbraco fields are always correct.

I still think this is something that should be fixed in ExamineX. Our situation is different from @chrden because it's not a custom field like homeId, but parentID is a default Umbraco field. This means that if we setup ExamineX following the docs and we don't do any custom configuration or custom composers the default parentID field is still sometimes a StringCollection instead of an interger which will break the searching. This gave us some headache because on test and acc the field type was an interger and on prod it was a StringCollection which broke the search. Even though we configured everything the same. I'm not sure if it's possible to fix this, but it's probably a situation which more people will run into.

Shazwazza commented 12 months ago

@jbreuer ExamineX does ensure that all Umbraco based fields are set correctly. It maps all definitions from the LuceneDirectoryIndexOptions over to the AzureSearchIndexOptions. This has been the case since the very beginning of ExamineX and hasn't been an issue with anyone using ExamineX to-date. What version of Umbraco are you using?

Shazwazza commented 12 months ago

@jbreuer I've validated this today with the latest version of ExamineX and Umbraco 12.2. The sequence hasn't changed since Umbraco 9:

chrden commented 11 months ago

@Shazwazza Just a quick note. I think this is now working! πŸ™Œ

I didn't make any changes from my last post so it needs a bit more testing but after a few attempts at deleting and rebuilding the index, the homeId value is consistently being set as an int rather than an array of string.

RE: your previous comment, I wasn't actually using a Composer to configure the LuceneDirectoryIndexOptions any more by that point and had instead migrated that code to an extension method which was being called in ConfigureServices to configure the 'AzureSearchIndexOptions' as below:

public static IUmbracoBuilder AddCustomConfigureOptions(this IUmbracoBuilder builder)
{
    builder.Services.Configure<AzureSearchIndexOptions>(UmbracoIndexes.ExternalIndexName, 
        options =>
        {
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.HomeId, AzureSearchFieldDefinitionTypes.Integer));
            options.FieldDefinitions.AddOrUpdate(new FieldDefinition(Constants.ExamineConstants.Fields.UpdateDate.UpdateDateTime, AzureSearchFieldDefinitionTypes.DateTime));
        }
    );

    return builder;
}

I have tested this method as well as following the ExamineX documentation more verbosely by configuring the AzureSearchIndexOptions in the ConfigureServices method directly as below, and both appear to be working correctly consistently:

public void ConfigureServices(IServiceCollection services)
{
    var config = services
        .AddUmbraco(_env, _config)
        .AddBackOffice()
        .AddWebsite()
        .AddComposers()
        .AddCustomAppsettingSections()
        .AddCustomServices()
        //.AddCustomConfigureOptions() <-- commented out to prevent duplication
        .AddCustomComponents()
        .AddCustomNotificationHandlers()
        .AddAzureBlobMediaFileSystem();

    services.Configure<AzureSearchIndexOptions>(UmbracoIndexes.ExternalIndexName, options =>
    {
        options.FieldDefinitions.AddOrUpdate(new FieldDefinition(ExamineConstants.Fields.HomeId, AzureSearchFieldDefinitionTypes.Integer));
        options.FieldDefinitions.AddOrUpdate(new FieldDefinition(ExamineConstants.Fields.UpdateDate.UpdateDateTime, AzureSearchFieldDefinitionTypes.DateTime));
    });

    services.AddCustomUmbracoClientSideRoutes();

    services.AddCacheBusting();

    config.Build();
}

As I said, no changes made and this started working so I'm still a little concerned about the order at which this is firing and that it might eventually revert back to being stored as an array of string so I will keep my eye on it and let you know if anything changes.

Thanks a lot for your persistence with me on this, much appreciated!