Kentico / xperience-by-kentico-lucene

Xperience by Kentico search integration with the latest 4.8 beta version of Lucene.NET
MIT License
4 stars 4 forks source link

Reusable content entries are deleted from the index when it is rebuilt #62

Closed bluemodus-mwills closed 1 week ago

bluemodus-mwills commented 1 month ago

Summary Rebuilding an index removes all reusable content items from an index. This means that even though it’s possible to create an indexing strategy that includes reusable content, it is not truly supportable. Reusable content will be indexed when it is created or updated, but if an admin clicks Rebuild in the Lucene index management UI, the indexed content will be removed.

To Reproduce

public override async Task<IEnumerable<IIndexEventItemModel>> FindItemsToReindex(IndexEventReusableItemModel changedItem)
{
    return await Task.FromResult(result: new List<IIndexEventItemModel>() { changedItem });
}
public override async Task<Document?> MapToLuceneDocumentOrNull(IIndexEventItemModel item)
{
    if (item is IndexEventReusableItemModel eventReusableItemModel)
    {
        var indexDocument = new Document()
        {
            // Content type name for facet
            new FacetField(SC.FIELD_CONTENT_TYPE_NAME, item.ContentTypeName)
        };
        var title = $"Title for {item.ContentTypeID}";
        indexDocument.AddTextField(SC.FIELD_TITLE, title, Field.Store.YES);
        indexDocument.AddSortedDocValuesField(SC.FIELD_TITLE, new BytesRef(title));
        return indexDocument;
    }
    return await base.MapToLuceneDocumentOrNull(item);
}

Expected behavior When rebuilding the index, reusable content items would not be removed.

Library Version Version 0.3.1 (but I'm looking at the latest code)

Code observations It looks like DefaultLuceneClient is missing code in the RebuildInternal method to enumerate the reusable content items and queue them for indexing.

liparova commented 1 month ago

Thank you for notifying us of the issue. We will proceed to troubleshoot and keep you updated on our analysis.

bluemodus-mwills commented 1 month ago

Thank you, Eva

bkapustik commented 1 month ago

Hello, I do not think that it makes sense to index the reusable items. The whole point of indexed items is that you can use them for searching on a live site. Reusable items can only be searched and viewed if they are shown on a site and they will be visible in a url of a WebPageItem. So indexing reusable items and webpageitems would result in a same page being indexed multiple times. It makes sense to trigger reindexing if a reusable item is edited, because it can result in pages being changed. Basically if a reusable item is reindexed you should return all the WebPageItems from the FindItemsToReindex method which are referencing the changed reusable item.

bluemodus-mwills commented 1 month ago

@bkapustik -- In general I agree with you. Most reusable content items should not be indexed as separate stand-alone items, but I gave you a scenario that makes sense and is common -- a reusable item that is an asset, like a PDF file, that should appear as a unique item in search results regardless of if or where it's linked on the site.

Imagine a reusable content item with fields like this:

Type: Resource Fields:

Such an item would need to be crawled independent of what web page it's linked from, or even if a page that linked to it was archived. Customers often want PDF files to appear directly in search results.

This would not lead to the same page being indexed multiple times, because the PDF and whatever page that links to it would be separate items in the index with separate URLs for viewing.

Additionally, the documentation states that reusable items are indexable, so this appears to be something that is supposed to work and does halfway.

DavidSlavik commented 1 month ago

Hello,

The scenario @bluemodus-mwills describes here makes sense to me. At the same time, I would like to say that the index should include every type of content configured for indexing, which I believe a reusable content item is. Subsequently, it is up to everyone how they will design the search results to render a link to a page or a link to download/display a PDF file, for example.

It would make sense even more if we process this one: #63

@bkapustik, please don't forget that we also have Algolia and Azure search, where we should offer consistent behavior in case we process this one.

bluemodus-mwills commented 1 month ago

Thank you, @DavidSlavik

bkapustik commented 1 week ago

Hello, the functionality is implemented in the pr #71