apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.48k stars 985 forks source link

Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide [LUCENE-10482] #11518

Open asfimport opened 2 years ago

asfimport commented 2 years ago

I was experimenting with the taxonomy index and DirectoryTaxonomyReaders in my day job where we were trying to replace the index underneath a reader asynchronously and then call the doOpenIfChanged call on it.

It turns out that the taxonomy index uses its own index based counter (the taxonomyIndexEpoch}) to determine if the index was opened in write mode after the last time it was written and if not, it directly tries to reuse the previous taxoArrays it had created. This logic fails in a scenario where both the old and new index were opened just once but the index itself is completely different in both the cases.

In such a case, it would be good to give the user the flexibility to inform the DTR to recreate its taxoArrays}, ordinalCache and categoryCache} (not refreshing these arrays causes it to fail in various ways). Luckily, such a constructor already exists! But it is private today! The idea here is to allow subclasses of DTR to use this constructor.

Curious to see what other folks think about this idea.


Migrated from LUCENE-10482 by Gautam Worah (@gautamworah96), updated Apr 19 2022 Pull requests: https://github.com/apache/lucene/pull/762

asfimport commented 2 years ago

Gautam Worah (@gautamworah96) (migrated from JIRA)

Submitting a patch soon

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 10ebc099c846c7d96f4ff5f9b7853df850fa8442 in lucene's branch refs/heads/main from Gautam Worah https://gitbox.apache.org/repos/asf?p=lucene.git;h=10ebc099c84

LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 5c2cbd712590fe98d101f30b528bee976af173ee in lucene's branch refs/heads/branch_9x from Gautam Worah https://gitbox.apache.org/repos/asf?p=lucene.git;h=5c2cbd71259

LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762) (#813)

asfimport commented 2 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @gautamworah96 – can we resolve this now?

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hey the commit breaks all builds on Windows because the test file name contains a ":" (looks like it is coming from the timestamp in filename). Please fix or revert!

Why do we need Instant.now() in the filename at all? createTempDir creates a unique test-dependend directory anyways. You just need to pass the test's name and youre done. Uniqueness is ensured by test framework. The problem with Instant.now() is also non-reproducibility!

asfimport commented 2 years ago

Gautam Worah (@gautamworah96) (migrated from JIRA)

Got it. My apologies for introducing this error. I had switched to using Instant.now() during testing because when the single test case was failing midway it was not able to delete the temporary folder. Subsequent runs of the test would fail because a folder with the particular name was already existing. Submitting a fix right now

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit d322be52f2407419c9fff4afee84ce4c87fe018d in lucene's branch refs/heads/main from Gautam Worah https://gitbox.apache.org/repos/asf?p=lucene.git;h=d322be52f24

LUCENE-10482 Bug Fix: Don't use Instant.now() as prefix for the temp dir name (#814)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 766c08e475ba31e2f5b7e1cf491cdacbe276ab67 in lucene's branch refs/heads/branch_9x from Gautam Worah https://gitbox.apache.org/repos/asf?p=lucene.git;h=766c08e475b

LUCENE-10482 Bug Fix: Don't use Instant.now() as prefix for the temp dir name (#814)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit c38870585542dd86c051c8978e944e39a386f8ec in lucene's branch refs/heads/main from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene.git;h=c3887058554

LUCENE-10482: Ignore this test for now

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 39a8c7d1369fc1d6f4232bc34cc87e8fe9cd925e in lucene's branch refs/heads/branch_9x from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene.git;h=39a8c7d1369

LUCENE-10482: Ignore this test for now

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit fb76d0b104ef843790848531cf14707e2059e079 in lucene's branch refs/heads/main from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb76d0b104e

LUCENE-10482, #11557: hrmph, put the @Ignore in the right place

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 2fa3a36899f4560ffb593449d6778307aa232e35 in lucene's branch refs/heads/branch_9x from Michael McCandless https://gitbox.apache.org/repos/asf?p=lucene.git;h=2fa3a36899f

LUCENE-10482, #11557: hrmph, put the @Ignore in the right place