AlreadyClosedException in search

benjaminc commented 3 years ago

We had an issue yesterday with search that caused two extended search outages for a client. Anytime a end-user of the site tried to use a particular search (it is an Umbraco site, but a custom data index using LuceneIndex with the default GetSearcher() call), they would get an error. Reviewing the logs we see that the error is always Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed. It appears to have started around a time of high CPU activity, when a query came in at the same time an index rebuild had been triggered.

After that point, all searches failed with the above error message for roughly the next 6 hours. There was then a period of roughly 1 hour where all the searches succeeded. After that was another roughly 5 hours where all the searches failed again before they restarted the site. After restarting the site, the problem has not yet resurfaced. The site had been running fine for months prior without an issue, and the most recent code deploy was more than a week prior.

Hosting: Single node, Azure Web App Umbraco: v8.14.0 Examine: 1.2.1

Stack Trace:

Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed
   at Lucene.Net.Index.IndexReader.EnsureOpen() in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexReader.cs:line 204
   at Lucene.Net.Index.IndexReader.IncRef() in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexReader.cs:line 163
   at Examine.LuceneEngine.LuceneSearchResults.SkipEnumerable.GetEnumerator() in C:\Users\Shannon\Documents\_Projects\Examine\Examine\src\Examine\LuceneEngine\LuceneSearchResults.cs:line 251
   at Examine.SearchResultsBase.GetEnumerator() in C:\Users\Shannon\Documents\_Projects\Examine\Examine\src\Examine\SearchResultsBase.cs:line 129
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

Please let me know if there is any additional information I can provide to help resolve this. I have spent a few hours digging through the Examine code, and the Umbraco IndexRebuilder class. I've posted my observations below if they help any, but I imagine you've spent far more time in this code so feel free to ignore these.

I can see that the IndexReader used comes from the IndexSearcher instance on the SearchContext in the LuceneSearchResults, and that the SearchContext is constructed on request by the LuceneSearcher class by passing in the _searcher field value. I also saw that the LuceneSearcher class does try to ensure the validity of its _reader field by calling ReaderReopener.ScheduleReopen() method from the ValidateSearcher() method, and that the _searcher.IndexReader field can't be reset within the Lucene code, and appears to always be set to the _reader field value in the Examine code.

My guess is that there is a race condition going on somewhere that was triggered by the combination of high CPU at the same time as a query and an index rebuild. My best guesses for culprits are:

The LuceneSearcher._searcher.IndexReader field doesn't point to the LuceneSearcher._reader field. This would mean that the reader status check in the ReaderReopener.MaybeReopen() method could see the _reader status as valid, but the _searcher.IndexReader would actually be invalid. I can't see directly where it would happen, but there are two separate places where these fields are set, with different locks around them, so possibly the compiler re-ordered the assignments for optimization and one set _reader then _searcher while the other set _searcher then _reader and there was a context switch right between the setting of the two fields?
The ReaderReopener._timer field wasn't null, but wasn't firing for some reason. This would cause the ReaderReopener.ScheduleReopen() method to simply do nothing, or try to change the timer interval, rather than directly try to check and reopen the reader. Again, I don't have a really convincing way of getting into this state.
The double-checked locking in the ValidateSearcher() method may be causing the _reader field to return null before entering the lock but then return not-null after entering the lock, thus leaving it unvalidated as to whether or not it is closed. While this could explain the symptoms, it would require that one thread is setting the value right as another thread is trying to read it, consistently for 6 hours, then not doing it for an hour, then consistently again for another 6 hours. This seems highly unlikely, but is theoretically possible.

I don't have any other good options. I've thought of several ways to get into a scenario where the _searcher field points to a closed reader, but the calls to reopen the reader should catch and fix that. These are the only ways I can think of where the reopen logic wouldn't catch a problem, or where the reopen logic itself wouldn't be called to catch a problem.

I'm also available to chat in the Umbraco Collaborators Slack or in the Umbraco Community Slack as Benjamin in either place if you want to pick my brain, or look deeper at code that I can't paste here in a public forum.

Shazwazza commented 3 years ago

How are you setting up this custom index?

Are you resolving a searcher from the manager every time you use it or have you pinned a searcher by accident to the application root?

benjaminc commented 3 years ago

We are using a class extending Umbraco.Examine.LuceneIndexCreator to create the index as part of a component, registered with a composer. We have a populator class that extends Umbraco.Examine.IndexPopulator to populate the index. There is an event that triggers when new data is added to the data source, and we use that to rebuild the index when new data is available. We then have a singleton search service registered in DI that uses an injected IExamineManager instance to get the index, create the searcher, and run the query.

This is the code for my search component, which triggers the creator, as well as rebuilding.

    public class CustomSearchComponent : IComponent
    {
        private readonly IExamineManager _examineManager;
        private readonly IndexRebuilder _indexRebuilder;
        private readonly CustomIndexCreator _customIndexCreator;
        private readonly ILogger _logger;

        public CustomSearchComponent (IExamineManager examineManager, IndexRebuilder indexRebuilder, CustomIndexCreator customIndexCreator, ILogger logger)
        {
            _examineManager = examineManager;
            _indexRebuilder = indexRebuilder;
            _customIndexCreator= customIndexCreator;
            _logger = logger;
        }

        public void Initialize()
        {
            var indices = _customIndexCreator.Create().ToList();

            indices.ForEach(i => _examineManager.AddIndex(i));

            RepositoryEvents.Saved += RebuildIndex;
        }

        private void RebuildIndex(object sender, SaveEventArgs e)
        {
            try
            {
                _indexRebuilder.RebuildIndex(IndexName);
            }
            catch (Exception ex)
            {
                _logger.Error<CustomSearchComponent>(ex, "Could not rebuild the custom search index");
            }
        }
    }

This is the CustomIndexCreator class referenced above:

    public class CustomIndexCreator : LuceneIndexCreator
    {
        public override IEnumerable<IIndex> Create()
        {
            var index = new LuceneIndex("CustomSearchIndex",
                CreateFileSystemLuceneDirectory("CustomSearchIndex"),
                new FieldDefinitionCollection(
                    new FieldDefinition("Public", FieldDefinitionTypes.Raw),
// Additional fields
                    new FieldDefinition("CompleteSearchField", FieldDefinitionTypes.FullText)
                ),
                new StandardAnalyzer(Version.LUCENE_30)
            );

            return new[] { index };
        }
    }

Finally, this is the relevant portion of the search service, which actually gets the index and searcher, and then creates and executes the query:

                var searcher = _examineManager.TryGetIndex("CustomSearchIndex", out var s) ? s.GetSearcher() : null;
                if (searcher == null) return Enumerable.Empty<ISearchResult>();

                var qry = searcher.CreateQuery().Field("Public", "1");
// Code to add search terms and boosting to the qry variable

                var results = qry.Execute();
                return results.ToList();

Shazwazza commented 3 years ago

Hi,

The thing that is probably causing issues is that you are rebuilding your index all the time. Index rebuilding should be avoided at all costs and only done when necessary. Indexes should just stay in sync with your data source. You can just use the IndexItems method to add/update index items with an ID.

Trying to do hot rebuilds with active searches may be problematic if your site is under load. Examine isn't designed for that type of operation to be done unless absolutely required.

benjaminc commented 3 years ago

K, we can definitely switch to using that for add/update. Is there a method we can use for removing items from the index as something is deleted from the source data?

Shazwazza commented 3 years ago

Yes DeleteFromIndex, see https://github.com/Shazwazza/Examine/blob/dev/src/Examine.Test/Index/LuceneIndexTests.cs#L142 and https://github.com/Shazwazza/Examine/blob/dev/src/Examine/IIndex.cs. I'll add to docs https://shazwazza.github.io/Examine/indexing.html

Shazwazza / Examine

AlreadyClosedException in search #240