apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.45k stars 973 forks source link

How to run tests with the Panama Vector implementation #13344

Open ChrisHegarty opened 1 month ago

ChrisHegarty commented 1 month ago

I previously used to do:

export CI=true; ./gradlew :lucene:core:test

but this is not enough, in the output we see:

WARNING: Vector bitsize and/or integer vectors enforcement; using default vectorization provider outside of testMode

because either or both of tests.vectorsize or tests.forceintegervectors is set.

ChrisHegarty commented 1 month ago

/cc @uschindler

uschindler commented 1 month ago

Could it be the ";" in your command line?

uschindler commented 1 month ago

wah you're right. It's not the export.

The reason for that is; During testing we set arbitrary bit sizes, but this will slow you down immense so it disables the panama provider explicit. This was added by Robert at some point.

The current code is fine, but theres no way to run all tests with the default provider.....

Let me think about this. We have to find some way to execute tests "under real conditions".

ChrisHegarty commented 1 month ago

Might be that we allow the default sizes/values to be set, so e.g. 128 bit on Mac, etc. without disabling the Panama provider.

uschindler commented 1 month ago

I think there should be the option to run this with defaults.

uschindler commented 1 month ago

I think we must do some magic to not set those: https://github.com/apache/lucene/blob/40cae087f71a875478309abfc4b1ab1e7b027cab/gradle/testing/randomization.gradle#L108-L113

This is all bad. The problem is that those settings are static and we can't have a per-instance seeting, so it affects both test mode and normal mode. As running all tests would be incredibly slow with wrong settings (non-default), we use default provider for all tests except those with use testMode.

uschindler commented 1 month ago

see https://github.com/apache/lucene/pull/12681

ChrisHegarty commented 1 month ago

Those settings are really only useful when running the test which compares that the Panama and default implementations emit the same results. If we could somehow just limit setting them to that test!!

uschindler commented 1 month ago

Those settings are really only useful when running the test which compares that the Panama and default implementations emit the same results. If we could somehow just limit setting them to that test!!

That's exactly the problem. According to Robert's testing, the static final constants were better for performance.

uschindler commented 1 month ago

No idea yet, still thinking about it!

ChrisHegarty commented 1 month ago

I really wanna spawn a new JVM with those options for a small set of tests, rather than have them infect all other running tests. For example, in the JDK test harness, jtreg, this is trivial - just add othervm to cause it to spawn a new VM for a specific test.

uschindler commented 1 month ago

There are two possibilities how to do this: Spawn the tests that use testMode in a child VM or we add a separate sourceset with tests that run with hardcoded bitsizes.

Maybe for the few tests we can use one of the "child spawner" test examples which do this already.

ChrisHegarty commented 1 month ago

I wonder if we can just avoid some of this complexity.

For example, locally I can run all the tests with the Panama Vector implementation, by doing this:

$ export CI=true; ./gradlew :lucene:core:test -Ptests.vectorsize="" -Ptests.forceintegervectors=false

( setting an empty string for vectorsize results in the default vector size for the platform )

What we want to support is to allow for testing Panama Vector with different sizes during development of the code, so basically this:

for bits in 128 256 512; do ./gradlew -p lucene/core test --tests TestVectorUtilSupport -Dtests.vectorsize=$bits; done

If we just flip the defaults of these properties, vectorsize to and forceintegervectors to false, then all tests will use Panama Vector as appropriate - same as the "real world", while not hurting the development time usage.

So the concrete proposal is to just set the default value of vectorsize to and forceintegervectors to false.

uschindler commented 1 month ago

We can change the defaults, sure. I am just afraid that Robert will be annoyed. He wants to randomly test all bit sizes.

By the way the empty string is allowed because of this: https://github.com/apache/lucene/blob/30da7dabe09f81802a4ae9161c2c24ab03ff8d0f/lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java#L50

My quickest idea would be to create dynamically 3 test Gradle tasks with correct config (similar like we do for beasting). I can try it out. Those tasks would just run the testMode=true tests in a separate task like beasting also does.

ChrisHegarty commented 1 month ago

By the way the empty string is allowed because of this:

Yes, exactly. I'm exploiting this, which is a fine way to say "default" :-)

My quickest idea would be to create dynamically 3 test Gradle tasks with correct config (similar like we fo for beating). I can try it out. Those tasks would just run the testMode=true tests in a separate task like beasting also does.

That would be awesome. So flip the properties back to defaults "real world" for the majority of tests, and add a few targeted gradle tasks that run TestVectorUtilSupport. Sounds good to me.

@uschindler I'll leave this to you since you seem to be more familiar with the infrastructure. But I'm happy to "have a go at it" if you want.

ChrisHegarty commented 1 month ago

@uschindler your idea above is clearly preferable, but another alternative is to just add to the set of values in vectorsize, that will result in enabling Panama Vector randomly. Not great, but better than the current state of affairs if your idea does not work out. ( And it maintains the different set of developer workflows )

uschindler commented 1 month ago

Hi, sorry for being late. I am busy at moment, but as quick fix I applied locally the following:

 gradle/testing/randomization.gradle | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/gradle/testing/randomization.gradle b/gradle/testing/randomization.gradle
index 44ae964cab0..f3d25d0c4e2 100644
--- a/gradle/testing/randomization.gradle
+++ b/gradle/testing/randomization.gradle
@@ -105,12 +105,8 @@ allprojects {
           [propName: 'tests.LUCENE_VERSION', value: baseVersion, description: "Base Lucene version."],
           [propName: 'tests.bwcdir', value: null, description: "Data for backward-compatibility indexes."],
           // vectorization related
-          [propName: 'tests.vectorsize',
-             value: { ->
-               RandomPicks.randomFrom(new Random(projectSeedLong), ["128", "256", "512"])
-             },
-             description: "Sets preferred vector size in bits."],
-          [propName: 'tests.forceintegervectors', value: "true", description: "Forces use of integer vectors even when slow."],
+          [propName: 'tests.vectorsize', value: null, description: "Sets preferred vector size in bits."],
+          [propName: 'tests.forceintegervectors', value: false, description: "Forces use of integer vectors even when slow."],
       ]
     }
   }

This restores the old defaults, but still hsows all parameters in documentation.

uschindler commented 1 month ago

P.S.: I had the same idea to enable it randomly by adding another item to the random list.

Next to that, maybe the best would be to have another option to do two things:

This would allow to run all tests under real conditions, e.g. like gradlew test -Ptests.useVectorizationDefaults=true or something like that.

ChrisHegarty commented 1 month ago

I opened the above PR #13351, which implements your suggestion @uschindler. We can consider that independently, of the possibility of adding specific test tasks for different bit sizes of TestVectorUtilSupport - which I would love to have eventually.

uschindler commented 1 month ago

Hi, thats a fine PR but it still has some problem, so I'd tend to move away from "devlopers" using CI=true and instead use another setting. Will explain in the PR.