Make CacheBuilder Cache stats collection optional

GoogleCodeExporter commented 9 years ago

It would be great to expose CacheBuilder#disableStats as a public method so we 
can disable stats completely when its not needed.

Original issue reported on code.google.com by kimchy on 11 Jan 2012 at 1:27

GoogleCodeExporter commented 9 years ago

Is this actually a performance problem for you?

Original comment by wasserman.louis on 11 Jan 2012 at 5:11

GoogleCodeExporter commented 9 years ago

Yes, it adds up. Also, all my stats are based on LongAdder from jsr166e, and I 
have a whole stats package built, so would love to be able to plug my own.

Original comment by kimchy on 11 Jan 2012 at 5:15

GoogleCodeExporter commented 9 years ago

Cont: Plug my own, i.e. have my own implementation for StatsCounter.

Original comment by kimchy on 11 Jan 2012 at 5:16

GoogleCodeExporter commented 9 years ago

I think it's highly unlikely that we'll provide either of these features, but I 
should explain why.

With regards to disabling stats, I find it almost impossible to believe that 
the overhead of maintaining cache stats isn't dominated by hash table lookups, 
concurrency details, and computing cache values.  This isn't a very expensive 
operation.  If you can provide profiling data to support this claim, then 
please let us know, but we'd need numbers to consider this change.

For the overwhelming majority of Cache users, maintaining stats is either 
definitely good or doesn't hurt, and correctly maintaining stats isn't a 
trivial thing.  Providing a way to plug your own stats manipulation would 
tremendously complicate the API, all to provide a feature that is a _bad idea_ 
for 90% of Cache users.  If you absolutely need to do your own stats work, you 
could probably design a CacheLoader wrapper that'd time cache load operations 
and the like, or figure out a ForwardingCache-based decorator that'd do what 
you need.

Original comment by wasserman.louis on 11 Jan 2012 at 8:36

Changed state: Triaged
Added labels: package-cache

GoogleCodeExporter commented 9 years ago

Original comment by wasserman.louis on 11 Jan 2012 at 8:36

Added labels: Package-Cache, Type-Enhancement
Removed labels: package-cache

GoogleCodeExporter commented 9 years ago

grr, the usual "prove is problematic and we will fix it" answer. You know the 
overhead of AtomicLong (writing to volatile field compared to reading in CHM 
lookup with CHM that hardly changes), and if I can remove it, its great. The 
get I do is in a very tight loop over a large number of hits. If you want, I 
can provide a test that shows the overhead of writing to volatile field 
compared to just reading from one, but honestly, I am now concerned that guava 
people don't know it...

Besides, the AbstractCache#StatsCounter interface is marked as @Beta, which is 
public, so I assumed it is planned to be exposed?

Original comment by kimchy on 11 Jan 2012 at 8:48

GoogleCodeExporter commented 9 years ago

No, the overhead of AtomicLong is well-known, although I'm not sure we 
necessarily have to suffer it.  What would you say if we used 
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/jsr166e/LongAdder.java?v
iew=markup instead?  From the doc: "This class is usually preferable to 
AtomicLong when multiple threads update a common sum that is used for purposes 
such as collecting statistics, not for fine-grained synchronization control.  
Under low update contention, the two classes have similar characteristics. But 
under high contention, expected throughput of this class is significantly 
higher, at the expense of higher space consumption."

The AbstractCache#StatsCounter interface is already exposed, but I'm not sure 
we're planning to do anything more with it other than let Cache implementors 
make use of it.

Original comment by wasserman.louis on 11 Jan 2012 at 9:06

GoogleCodeExporter commented 9 years ago

I pointed to LongAdder in my previous comment, its part of jsr166e which will 
be included in Java 8, but I personally use it to compute all the stats I do in 
my app. You can't really use it unless you embed the class in guava, but I 
really don't understand the resistance to allow to simply disable the stats, 
even if you don't plan to allow for custom stats collectors.

Original comment by kimchy on 11 Jan 2012 at 9:35

GoogleCodeExporter commented 9 years ago

I'd personally like to hear e.g. Fry's opinion here, but here's my logic:

We like to make it as hard as possible for Guava tools to get misused.
For a large majority of Cache users, it's good to have stats around: because if 
suddenly hit rate goes way down in production, something's catastrophically 
wrong; because it's a good way to examine system health generally, etc.

If there's an obvious way to turn stats off, some programmers might turn it off 
even though it's not *actually* a good idea for them.

I dunno.  I'm curious to hear other Guava team members' opinions.

Original comment by wasserman.louis on 11 Jan 2012 at 9:44

GoogleCodeExporter commented 9 years ago

Sorry, don't buy this reasoning. If you don't *use* the stats in your code 
(expose them somehow, JMX, rest), then you don't use it. Nobody will go to a 
live system and dynamically load a class to the JVM that might introspect the 
Cache in realtime to suddenly check if there are problems based on stats that 
were never monitored to begin with. If they are monitored / enabled, then fine. 
Thats the default. If not, let me disable them.

Original comment by kimchy on 11 Jan 2012 at 9:52

GoogleCodeExporter commented 9 years ago

I think I'm getting more convinced, but I'd like to hear other opinions.

Original comment by wasserman.louis on 11 Jan 2012 at 10:43

GoogleCodeExporter commented 9 years ago

I added disableStats at the suggestion of Tim Peierls, but Kevin suggested that 
we wait to make it public until anyone actually needed it. While I'm personally 
onboard with making it public it would be hugely helpful to have a simple 
Caliper benchmark that demonstrates the performance gain this would provide to 
your application.

Original comment by fry@google.com on 12 Jan 2012 at 4:59

GoogleCodeExporter commented 9 years ago

Here is a simplified test of what I do. Basically, I have a mostly read cache 
(usually size bound) with a tight loop of many iterations. Writing to a 
volatile field within that tied loop is, well, problematic. 
https://gist.github.com/1597555.

Original comment by kimchy on 12 Jan 2012 at 6:02

GoogleCodeExporter commented 9 years ago

Cool. Would you be up for creating a trivial Caliper benchmark for that? You've 
already done most of the work, and that would be a huge selling point for this.

Original comment by fry@google.com on 12 Jan 2012 at 6:12

GoogleCodeExporter commented 9 years ago

No matter how clear it is that this new CacheBuilder method would benefit you 
in your case, the bar is very high for adding more methods to CacheBuilder.  We 
would have to be convinced that enough typical users of CacheBuilder would 
actually notice this issue costing them a significant amount of time (relative 
to the other expenses of their request), then discover the disableStats() 
method, and call it.  I'm skeptical.

Original comment by kevinb@google.com on 13 Jan 2012 at 5:40

GoogleCodeExporter commented 9 years ago

@kevin: Still don't understand the heavy implications of allowing to disable 
stats. You are going to make the Stats counting interface public at one stage, 
why not that? 

Its funny, you work so hard to make read fast by doing volatile reads, and then 
add something like AtomicLong as an afterthought. Lets see what happens when 
you suggest to Doug Lea to add just a volatile write in CHM (and I am not 
talking about compare and swap here). It all adds up.

Original comment by kimchy on 13 Jan 2012 at 7:32

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

As I've always understood it, Guava is _extremely_ sensitive to adding features 
that will either go unused or will be used by only a tiny fraction of people.

I'm not sure what you mean by making the stats counting interface public -- is 
there something that's not exposed now that you expect to be exposed? -- but 
CacheBuilder is already pretty conceptually heavy: when you're picking 
CacheBuilder settings, there are a lot of options to choose from with pretty 
major effects.

All of the current CacheBuilder options get used quite a lot, mind you: if you 
look through the CacheBuilder settings, you can see that they're each useful to 
a large class of users.  (The one possible exception might be weak values.)  
For someone new to the Guava cache implementation, it takes a lot of effort to 
read through the CacheBuilder Javadoc, figure out what all the various options 
do, and which of those options is actually right for them.

That's the reason for the extremely high bar: because adding more features to 
CacheBuilder adds even more conceptual weight, makes the learning curve more 
difficult, and gets in the way of users trying to figure out what's right for 
them.  If a tool doesn't get used by many users at all, it just gets in the way 
for everybody else.

I've always understood that these things are at least as important to the Guava 
team as performance.

Original comment by wasserman.louis on 14 Jan 2012 at 1:03

GoogleCodeExporter commented 9 years ago

I feel being able to disable stats could be a useful attribute in performance 
sensitive situations. Switching to the JSR166 LongAdder could also be quite 
useful. 

I can't see a reason for adding your own stats counter though, is there a 
particular metric which is not already included in the stats counter? How about 
an alternative where the specifics of which stats we wanted to record could be 
configured. This would let you tune your recorded stats and performance without 
people adding their own (usually poorer) implementations.

Just my 2 cents

Original comment by emily@soldal.org on 18 Jan 2012 at 11:56

GoogleCodeExporter commented 9 years ago

I'm strongly in favor of switching to the JSR 166 LongAdder, but I still share 
Kevin's concerns that disableStats doesn't meet the high bar we really need to 
have for CacheBuilder methods.

> We would have to be convinced that enough typical users of CacheBuilder would 
actually notice this issue costing them a significant amount of time (relative 
to the other expenses of their request), then discover the disableStats() 
method, and call it.  I'm skeptical.

Original comment by wasserman.louis on 18 Jan 2012 at 6:41

GoogleCodeExporter commented 9 years ago

I'm also sympathetic with the request. I hadn't realized that this was a 
hotspot in the cache that all get()s hit with volatile writes. Even if for many 
users this is not a problem /today/, this is going to be a problem with higher 
concurrency levels, it simply won't scale.

But (I see this is already suggested) I would prefer simply providing a 
scalable implementation, instead of an opt-out API. If we can avoid to stretch 
an API just for the performance issues of today (which may be different for 
tomorrow), we should, and it seems we can.

kimchy, what's your concurrency level anyway?

Original comment by jim.andreou on 19 Jan 2012 at 10:00

GoogleCodeExporter commented 9 years ago

Note that stats are currently accumulated _per segment_.

Original comment by fry@google.com on 19 Jan 2012 at 10:12

GoogleCodeExporter commented 9 years ago

@jim: My concurrency level is as many cores the machine has, 4 in my testcase 
(obviously, the more cores, the worse it will be because of cpu cache havoc), 
the use case is iterating over a very large size and doing get for each in a 
tight loop. Its the best way to suddenly see how much volatile writes affect 
perf :). I have an interesting discussion on the concurrency mailing list when 
Joda-Time moved to have volatile fields on the mutable date, sadly, could not 
convince them to not do it.

As for the best way to solve it (from my view). Its simply to allow to plug in 
your own stats collector (can be noop). In the future, there will be a 
LongAdder in Java 8, and god knows what we will have in the even more remote 
future (personal wish: cpu affinity). Especially since the stats collection 
interface is marked as @Beta.

Being able to (publicly) disable it is also fine, because I actually have a 
different level of stats that I use outside of it (as explained before, based 
on a custom build of LongAdder).

Original comment by kimchy on 19 Jan 2012 at 10:15

GoogleCodeExporter commented 9 years ago

That's...a very good point that I hadn't noticed.  I'm still going to 
experiment with benchmarking a JSR166e implementation, but that makes me much 
less worried about high AtomicLong contention.

Original comment by wasserman.louis on 19 Jan 2012 at 10:17

GoogleCodeExporter commented 9 years ago

Which goes back to my initial plea to produce some Caliper stats from a 
real-world example. I see no other way to do anything more than guess at what 
the true cost is...

Original comment by fry@google.com on 19 Jan 2012 at 10:19

GoogleCodeExporter commented 9 years ago

I'm not actually sure whether or not LongAdder beats AtomicLong if it's being 
done on a per-segment basis.

But I'm definitely with fry: we can't estimate whether or not it's worth 
messing with the API without a real-world benchmark to show the quantitative 
improvement.

Original comment by wasserman.louis on 19 Jan 2012 at 10:39

GoogleCodeExporter commented 9 years ago

Oh, sorry, I didn't check the source. (Charles, this was what I was trying to 
ask previously, misunderstood the answer I guess).

If these are per segment, why are they volatile writes instead of regular 
writes ("occasionally" flushed to the global counters, like the occasional 
cleanups we do)? Can't this be reworked?

I'm still out of touch with this code so excuse me if I again miss something 
big :)

Original comment by jim.andreou on 19 Jan 2012 at 11:14

GoogleCodeExporter commented 9 years ago

Are you sure its segment based? There is a segments stats, but global stats is 
still used, for example, here: 
http://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/comm
on/cache/LocalCache.java#3952.

Even with segment based stats, it does not change the fact that there is a 
volatile write (compare and swap actually) happening. I know, it might sound 
anal, its "just" a volatile write. But with the current design, and working so 
hard (much appreciated!) to do get with just volatile read, I don't understand 
the "ease" of mind to add a counter.

Original comment by kimchy on 19 Jan 2012 at 11:32

GoogleCodeExporter commented 9 years ago

Hmmm.  That does look at least partially global.

But seriously, please, do benchmarks!

Original comment by wasserman.louis on 19 Jan 2012 at 11:42

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Ah, I see, that was silly. A get() in a segment is just a volatile read on the 
entry, the segment isn't locked, so if the writes were regular, that would be 
racy.

But... oh well. The original design of CHM was all about avoiding volatile 
writes in get(), and this somewhat defeats it. It should still be slightly 
better than just locking the whole segment per any kind of access, but not by 
much.

Original comment by jim.andreou on 20 Jan 2012 at 12:11

GoogleCodeExporter commented 9 years ago

We all agree that this is abstractly BAD. But, we're not going to do anything 
more than talk until we have some NUMBERS. How bad is it in real applications?

Original comment by fry@google.com on 20 Jan 2012 at 3:10

GoogleCodeExporter commented 9 years ago

I ran a simple single-threaded Caliper benchmark which showed no difference 
between running with and without stats.

I'm going to file this away as an academic request until proven otherwise.

Original comment by fry@google.com on 30 Jan 2012 at 9:11

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

...I am reminded that I have a multithreaded application using CacheBuilder, 
myself.  I will attempt to work it into a Caliper microbenchmark.

Original comment by wasserman.louis on 30 Jan 2012 at 9:13

GoogleCodeExporter commented 9 years ago

(An *actual* application!  Who would have thought that I would ever write 
anything but libraries!?)

Original comment by wasserman.louis on 30 Jan 2012 at 9:14

GoogleCodeExporter commented 9 years ago

(raving disbelief)

Original comment by fry@google.com on 31 Jan 2012 at 12:29

GoogleCodeExporter commented 9 years ago

Doug's multithreaded benchmarks of CHM come to mind, not sure if they are 
available somewhere. Applying them to Cache should be useful, regardless of 
this particular issue. A single-threaded test can't illuminate much, and 
writing more useful benchmarks is a pain.

Original comment by andr...@google.com on 31 Jan 2012 at 5:15

GoogleCodeExporter commented 9 years ago

We were wrong here. We're gonna reverse course. Message to guava-discuss 
forthcoming.

Original comment by fry@google.com on 3 Apr 2012 at 2:09

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Fix to disable stats by default will be pushed soon.

Original comment by fry@google.com on 6 Apr 2012 at 3:05

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Breaking the API is not cool :(

Why didn't you want to add LongAdder to Guava and use that by default (+ a knob 
in CacheBuilder to allow disabling stats if needed, but leave them on by 
default)?

Original comment by tsuna...@gmail.com on 6 Apr 2012 at 4:05

GoogleCodeExporter commented 9 years ago

We also plan to migrate to LongAdder, but that only helps throughput and not 
latency.

Original comment by fry@google.com on 6 Apr 2012 at 4:08

GoogleCodeExporter commented 9 years ago

I think the benchmarks demonstrate that turning them off by default is the only 
long-term solution, even if that causes some one-time pain for users.  =/

Original comment by wasserman.louis on 6 Apr 2012 at 4:28

GoogleCodeExporter commented 9 years ago

We've cherry picked this into the release 12 branch.  It will be in rc2.

Original comment by cpov...@google.com on 6 Apr 2012 at 4:42

Changed title: Make CacheBuilder Cache stats collection optional
Added labels: Milestone-Release12

GoogleCodeExporter commented 9 years ago

This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:14

Added labels: MigratedToGitHub

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:09

Added labels: Restrict-AddIssueComment-Commit

chen870647924 / guava-libraries

Make CacheBuilder Cache stats collection optional #863