Closed cxzl25 closed 7 months ago
BTW, there is no official Apache Hive release with this patch
release-4.0.0-alpha-1 on Mar 30, 2022
No, that's 'alpha`. Nobody uses it seriously, @cxzl25 .
No, that's 'alpha`. Nobody uses it seriously.
Thanks for your comment, let me close this PR first.
I'm wondering how you did verify the claim's PR? Did you see the actual perf benefits in a way
I wrote a simple test that generated a random string and tested insert and lookup.
main insert:00:00:00.689
main lookup:00:00:01.124
PR insert:00:00:00.628
PR lookup:00:00:01.055
@Test
public void testLen() {
int maxSize = 200000;
Random gen = new Random();
String[] strings = new String[maxSize];
for (int i = 0; i < maxSize; i++) {
strings[i] = RandomStringUtils.random(Math.abs(gen.nextInt(1000)));
}
byte[][] values = getByteArrays(strings);
StopWatch mainSW = new StopWatch();
// load set
mainSW.start();
CuckooSetBytes main = new CuckooSetBytes(strings.length);
main.fastLookup = false;
for (byte[] v : values) {
main.insert(v);
}
mainSW.split();
System.out.println("main insert:" + mainSW);
// test that the values we added are there
for (byte[] v : values) {
assertTrue(main.lookup(v, 0, v.length));
}
mainSW.stop();
System.out.println("main lookup:" + mainSW);
StopWatch prSW = new StopWatch();
prSW.start();
CuckooSetBytes pr = new CuckooSetBytes(strings.length);
pr.fastLookup = true;
for (byte[] v : values) {
pr.insert(v);
}
prSW.split();
System.out.println("PR insert:" + prSW);
for (byte[] v : values) {
assertTrue(pr.lookup(v, 0, v.length));
}
prSW.stop();
System.out.println("PR lookup:" + prSW);
}
Thank you, @cxzl25 . Please add the above code and result into the PR description.
Merged to main/2.0 for Apache ORC 2.0.0. Thank you again!
What changes were proposed in this pull request?
Add boundary conditions on "length" with the min/max length stored in the hashes.
Why are the changes needed?
https://issues.apache.org/jira/browse/HIVE-24205
How was this patch tested?
GA
Was this patch authored or co-authored using generative AI tooling?
No