apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.1k stars 647 forks source link

Slight performance improvements to org.apache.jena.atlas.lib.cache.CacheSimple #2738

Closed arne-bdt closed 23 hours ago

arne-bdt commented 4 days ago

Change

Slight performance improvements to org.apache.jena.atlas.lib.cache.CacheSimple:

Downside: The fixed cache size must always be a power of 2. If the given size is already a power of two it will be used as fixed size for the cache, otherwise the next larger power of two will be used. (e. g. minimumSize = 10 results in 16 as fixed size for the cache)

Benchmark                                           (param0_GraphUri)  (param1_Cache)  Mode  Cnt  Score   Error  Units
TestCaches.createAndFillCacheByGet      ../testing/BSBM/bsbm-5m.nt.gz          Simple  avgt   15  0,206 ± 0,009   s/op
TestCaches.createAndFillCacheByGet      ../testing/BSBM/bsbm-5m.nt.gz  Jena510.Simple  avgt   15  0,228 ± 0,006   s/op
TestCaches.createAndFillCacheByPut      ../testing/BSBM/bsbm-5m.nt.gz          Simple  avgt   15  0,257 ± 0,009   s/op
TestCaches.createAndFillCacheByPut      ../testing/BSBM/bsbm-5m.nt.gz  Jena510.Simple  avgt   15  0,293 ± 0,013   s/op
TestCaches.getFromFilledCache           ../testing/BSBM/bsbm-5m.nt.gz          Simple  avgt   15  0,277 ± 0,008   s/op
TestCaches.getFromFilledCache           ../testing/BSBM/bsbm-5m.nt.gz  Jena510.Simple  avgt   15  0,290 ± 0,007   s/op
TestCaches.getIfPresentFromFilledCache  ../testing/BSBM/bsbm-5m.nt.gz          Simple  avgt   15  0,226 ± 0,006   s/op
TestCaches.getIfPresentFromFilledCache  ../testing/BSBM/bsbm-5m.nt.gz  Jena510.Simple  avgt   15  0,262 ± 0,009   s/op

Is the downside of the possibly larger cache size a problem or does it outweigh the benefits?

Are you interested in contributing a pull request for this task?

Yes

afs commented 3 days ago

On the smaller points:

  • removed Objects.requireNonNull() - checks --> because they throw a NullPointerException anyway

Yes - there will be an NPE anyway although it'll be from a different place, making it look like an implementation bug. Having Objects.requireNonNull() makes the contract a little clearer.

I thought Objects.requireNonNull() was very low cost - the argument is very likely in a register or the near CPU cache and a test for null is a machine code instruction that is executed early in the CPU pipeline.

removed already deprecated BiConsumer<K, V> dropHandler), which is used nowhere

I take you mean not exposed by CacheFactory? I Does this make a difference? It's one == null test.

That said, a slotted cache is probable not appropriate for caching items that need drop handling.

arne-bdt commented 2 days ago

Benchmarks showed no signifficant difference with Objects.requireNonNull() - checks. I add the Objects.requireNonNull code again (even one more) and javadoc comments to the whole Cache interface for all parameters and mentioned the not null constraints.

The drop handler was not part of the interface, not used anywhere and there were no tests for it, so I decided to remove it. (I did not measure the exact difference)

My motivation for this Issue/PR came from my work on GraphMem2, where I found, that the simpler bit operations on caches with a sizes which are a power of two can make a difference.

afs commented 2 days ago

Objects.requireNonNull has a JDK @ForceInline declaration.

arne-bdt commented 2 days ago

Objects.requireNonNull has a JDK @ForceInline declaration.

From now on, I will never remove them again. Instead, I will use them wherever appropriate. :-)

afs commented 2 days ago

I think we should - it is an opportunity to have a meaningful error message :-)

It does grate a little when the test is applied twice because an API contract operation calls another API contract operation.