aws / event-ruler

Event Ruler is a Java library that allows matching many thousands of Events per second to any number of expressive and sophisticated rules.
Apache License 2.0
556 stars 58 forks source link

Ensure numeric matching respectes precisions as described in our documentation #166

Closed baldawar closed 1 week ago

baldawar commented 1 week ago

Issue #, if available: https://github.com/aws/event-ruler/issues/163

Description of changes:

As was reported in issue 163, ruler today ignores precision and causes false matches for numbers or rules with high precision numbers. This change moves away from using double for doing arithmetic adjustments within ComparableNumber.

Along the way the API to generate comparable numbers is changed from using Strings instead of Double. This allows for more accurate rule matching for numbers with 6+ digits without compromising on performance.

A bunch of our tests needed to be changed / fixed as a result of this change. These have been fixed. We're added additional test cases to help catch precision issues in future.

Benchmark / Performance (for source code changes):

/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/bin/java -Dvisualvm.id=30211868140673 -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=50554:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8 -classpath /Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Applications/IntelliJ IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/amazon-corretto-8.jdk/Contents/Home/lib/tools.jar:/Volumes/Unix/workspaces/event-ruler/target/test-classes:/Volumes/Unix/workspaces/event-ruler/target/classes:/Users/baldawar/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.17.1/jackson-databind-2.17.1.jar:/Users/baldawar/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.17.1/jackson-annotations-2.17.1.jar:/Users/baldawar/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.17.1/jackson-core-2.17.1.jar:/Users/baldawar/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/Users/baldawar/.m2/repository/junit/junit/4.13.2/junit-4.13.2.jar:/Users/baldawar/.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar com.intellij.rt.junit.JUnitStarter -ideVersion5 -junit4 software.amazon.event.ruler.Benchmarks
High NameState Reuse Memory Benchmark
Before: 254.8 (1)
After: 361.5 (223380)
Per rule: 139043 (290)
Reading citylots2
Read 213068 events
EXACT events/sec: 204676.3
WILDCARD events/sec: 138897.0
PREFIX events/sec: 226909.5
PREFIX_EQUALS_IGNORE_CASE_RULES events/sec: 216973.5
SUFFIX events/sec: 227880.2
SUFFIX_EQUALS_IGNORE_CASE_RULES events/sec: 227151.4
EQUALS_IGNORE_CASE events/sec: 189057.7
NUMERIC events/sec: 111905.5
ANYTHING-BUT events/sec: 127585.6
ANYTHING-BUT-IGNORE-CASE events/sec: 112913.6
ANYTHING-BUT-PREFIX events/sec: 128122.7
ANYTHING-BUT-SUFFIX events/sec: 111379.0
ANYTHING-BUT-WILDCARD events/sec: 137997.4
COMPLEX_ARRAYS events/sec: 35546.9
PARTIAL_COMBO events/sec: 51132.2
COMBO events/sec: 20205.6
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 13579
Events/sec: 15691.0
 Rules/sec: 109837.0
Low NameState Reuse Memory Benchmark
Before: 1779.7 (1)
After: 1239.9 (2625460)
Per rule: -702800 (3418)
Before: 1861.7 (1)
After: 985.6 (3254415)
Per rule: -2190 (8)
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 4100
Events/sec: 51967.8
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1605
Events/sec: 132752.6
 Rules/sec: 483485143.9
Before: 2045.9 (1)
After: 662.6 (4469583)
Per rule: -3458 (11)
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 20022
Events/sec: 10641.7
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 12431
Events/sec: 17140.1
 Rules/sec: 119980.4
DEEP EXACT events/sec: 9090.9

Process finished with exit code 0

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

timbray commented 1 week ago

Rishi, should we have a close look now or do you want to polish some more first?

baldawar commented 1 week ago

hey @timbray this isn't ready yet to review. Still polishing. Didn't realize github sends out a notification even for draft PRs.

timbray commented 1 week ago

No prob. One request: when you think it's stable, it would be useful to include any new language in README.md or wherever that states the constraints on numeric values. Formerly: +/-5B, 6 fractional digits. Or maybe it doesn't change?

baldawar commented 1 week ago

Alright this one is ready for some scrutin @timbray .

timbray commented 1 week ago

BTW, Quamina probably won't follow this path, because unfortunately Go doesn't have built-in BigDecimal, and the benefits of having 6 rather than 5 decimal digits is smaller than the cost of accepting an uncontrolled external dependency. Would hope that some future version of Go gets good decimal support because I like the approach in this PR.

baldawar commented 1 week ago

In earlier versions of this PR, you had remarked that there was one controversial part, where you fell back to parsing hex versions of numbers in the data. I think that is now gone? I didn't see it.

Left a comment here https://github.com/aws/event-ruler/pull/166/files#r1668969004.

One optimization that Quamina does makes a big difference.

Its there but implemented as a counter https://github.com/aws/event-ruler/blob/ccafd48aee587ec45239ba630c0a679dd837278b/src/main/software/amazon/event/ruler/ByteMachine.java#L115

baldawar commented 1 week ago

I didn't realize hex numbers aren't legal JSON. I had only looked at the types of numbers Java supports but missed checking if they are part of JSON spec or not.

Let me remove this bit and associated tests for now.