SDKits / ExamineX

Issue tracker for ExamineX
https://examinex.online
5 stars 0 forks source link

Unexpected result on filtering index #92

Closed pariashiri closed 9 months ago

pariashiri commented 9 months ago

Hello, I have been following this [article] (https://shazwazza.com/post/filtering-fields-dynamically-with-examine/) to filter indexes, I made some changes to make it work in Umbraco 10.8.

private void TransformingIndexValues(object sender, IndexingItemEventArgs e)
{
    if (_fieldNames.TryGetValue(e.Index.Name, out var fields))
    {
        //Check if we should ignore this doc by content type
        if (fields.contentTypes.Length > 0 && !fields.contentTypes.Contains(e.ValueSet.ItemType))
        {
            e.Cancel = true;
        }
        else
        {
            var values = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());

            values.RemoveAll(x =>
            {
                if (fields.internalFields.Contains(x.Key)) return false;
                if (fields.fieldPrefixes.Any(f => x.Key.StartsWith(f))) return false;
                return true;
            });

            e.SetValues(values.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
        }
    }
}

As you can see, we set e.cancel to true for the published items which we want to skip being added to the index. This partially works, but it is only excluding 50% of the items that we expect to be excluded, the other 50% are still managing to get through to the index. Could you please investigate the logic which uses this Cancel flag, as it seems to be skipping over half of the ones which should be cancelled, so we believe there is a bug in your logic.

my versions: ExamineX.AzureSearch.Umbraco 4.1.7 Umbraco.Cms 10.8.0-rc

Regards

Shazwazza commented 9 months ago

Hi, there are tests for this in our codebase which all pass. These tests test both cancellation and field filters. The field filters in these tests are opposite of that in the blog article but perhaps might help you in debugging?

[Test]
public void Can_Cancel_IndexingItemEventArgs()
{
    var waitHandle = new ManualResetEventSlim();

    void Handler(object sender, IndexingItemEventArgs args)
    {
        args.Cancel = true;
        waitHandle.Set();
    }

    AzureIndex.TransformingIndexValues += Handler;

    try
    {
        AzureIndex.IndexItem(new ValueSet(NewId(), "content",
            new Dictionary<string, IEnumerable<object>>
            {
                {"blah1234", new List<object>(new object[] { "Hello" })}
            }));

        waitHandle.Wait();

        var s = GetDelayedSearcher();
        var result = s.CreateQuery().Field("blah1234", "Hello").Execute();

        Assert.AreEqual(0, result.TotalItemCount);
    }
    finally
    {
        AzureIndex.TransformingIndexValues -= Handler;
    }
}

[Test]
public void Can_Filter_Fields_IndexingItemEventArgs()
{
    var waitHandle = new ManualResetEventSlim();

    var filteredFields = new[] { "toBeRemoved" };
    var filteredPrefixes = new[] { "remove" };

    void Handler(object sender, IndexingItemEventArgs e)
    {
        var values = e.ValueSet.Values.ToDictionary(x => x.Key, x => x.Value.ToList());

        values.RemoveAll(x =>
        {
            if (filteredFields.Contains(x.Key))
            {
                return true;
            }

            if (filteredPrefixes.Any(f => x.Key.StartsWith(f)))
            {
                return true;
            }

            return false;
        });

        e.SetValues(values.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
    }

    AzureIndex.TransformingIndexValues += Handler;
    AzureIndex.CreatingOrUpdatingIndex += (sender, args) => waitHandle.Set();

    var origCount = AzureIndex.FieldReferences.Fields.Count;

    try
    {
        AzureIndex.IndexItem(new ValueSet(NewId(), "content",
            new Dictionary<string, IEnumerable<object>>
            {
                {"toBeRemoved", new List<object>(new object[] { "Hello" })},
                {"notRemoved", new List<object>(new object[] { "World" })},
                {"removedByPrefix", new List<object>(new object[] { "Yay" })}
            }));

        waitHandle.Wait();

        Thread.Sleep(10000);

        Assert.AreEqual(origCount + 1, AzureIndex.FieldReferences.Fields.Count);
    }
    finally
    {
        AzureIndex.TransformingIndexValues -= Handler;
    }
}
Harigon commented 9 months ago

Hi @Shazwazza, your test only covers cancelling one of them. Would you mind creating a test which cancels 10 items from a total list of 10, and I believe you will recreate our bug of only 5 items (50%) being cancelled.

Shazwazza commented 9 months ago

@Harigon thanks! 🙏 you are absolutely correct and have found a bug. I'll get this patched asap and get a release out.

pariashiri commented 9 months ago

Great! we are waiting for the fix.

Shazwazza commented 9 months ago

This is now resolved in 4.1.8 https://github.com/SDKits/ExamineX/releases/tag/v4.1.8 please let us know if this resolves your issue. The updated tests shows this is working correctly now. 5.0.2 will be released alongside this with this fix applied too.