Need method(s) to obtain caching statistics from objectify

GoogleCodeExporter commented 9 years ago

It'll be useful to provide some method that provides the statistics of cache 
usage, both session cache and memcache, on a per entity basis. This will be 
very useful to profile the datastore reads and further optimize the app to 
reduce datastore related bills.

Original issue reported on code.google.com by sunnymax...@gmail.com on 18 Sep 2011 at 11:35

GoogleCodeExporter commented 9 years ago

I agree. Would you like to take a first pass at adding this?

Original comment by latch...@gmail.com on 18 Sep 2011 at 4:47

GoogleCodeExporter commented 9 years ago

Sure, I'll try something and post something for code review in a couple of 
days...

Original comment by sunnymax...@gmail.com on 18 Sep 2011 at 4:54

GoogleCodeExporter commented 9 years ago

Awesome! We welcome the contributions. If you want, you can discuss how you 
might go about implementing it on the mailing lists first and I'm sure Jeff 
(and I) will have some constructive feedback.

Original comment by latch...@gmail.com on 18 Sep 2011 at 4:56

GoogleCodeExporter commented 9 years ago

I would start with a hook in CachingAsyncDatastoreService such that any time a 
hit or a miss happens, a method gets called.

Then in your local code, derive a new service object that overrides this method 
and tracks the statistics - either in memcache or in ram.  Then you can 1) 
create it in a derived ObjectifyFactory and 2) create a servlet (or whatever) 
that renders the appropriate data.

Original comment by lhori...@gmail.com on 18 Sep 2011 at 7:30

Changed state: Accepted
Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Thanks

I thought on similar lines, but a little different (more tightly coupled within 
Objectify, but simpler for users):

1. Create a wrapper around memcacheservice used in 
CachingAsyncDatastoreService, so that the memcache itself tracks the 
statistics, whenever we do "get" in memcache.

2. CachingAsyncDatastoreService constructor still accepts the normal memcache, 
and this wrapping is done inside the constructor, so that existing code is not 
impacted.

3. In case this statistics collection has a non-trivial penalty, we can add an 
Opt to disable statistics collection by memcache if caching enabled.

4. Currently, my thought is focussed only on global caching in memcache.

Now coming to actual statistics collection:

6. Stats will be stored on a per entry basis in memcache itself, since we don't 
want to lose the stats due to instance shutdown...

5. Upon every get, if stats collection is enabled, it'll sort by entity type 
and check hit / miss, and accordingly update the stats record

Pls let me know if it sounds good, and more importantly if I'm missing 
something important, due to which we actually need a more decoupled 
implementation like proposed above...

Original comment by sunnymax...@gmail.com on 19 Sep 2011 at 2:10

GoogleCodeExporter commented 9 years ago

What happens if memcache dies?

Original comment by latch...@gmail.com on 19 Sep 2011 at 2:18

GoogleCodeExporter commented 9 years ago

My thoughts:

 * Creating a wrapper around memcacheservice will let you track hits, but won't let you track misses.  To track hit rates, you need to know both.

 * Putting the data in memcache may be overkill.  All frontend instances are peers, so sampling from one instance should be a perfectly valid measurement.  The main value of using memcache would be the ability to easily reset the statistics, but I'm not sure it's compelling enough to add the overhead of yet another RPC to every datastore fetch.

 * If we track the hit/miss rates in internal fields, we can just run it permanently and expose a simple API that gives you hit rate as a percentage for any given Kind.  If we store stats in memcache, we need to let developers enable/disable it because there is a real performance impact.

Original comment by lhori...@gmail.com on 19 Sep 2011 at 3:11

GoogleCodeExporter commented 9 years ago

Thanks for feedback. 

If I understand correctly, a cache "get" is first done to see the items 
available in cache, and for those who're not available, datastore "get" is 
done. Cache hit is # items returned by cache get, and miss is total items 
queried "minus" cache hit, for this particular access? Pls let me know if this 
is NOT the case! Here's the code I'm referring to, from 
CachingAsyncDatastoreService:

"
Map<Key, Entity> soFar = this.getFromCache(keys);

            Set<Key> stillNeeded = new HashSet<Key>();
            for (Key getKey: keys)
                if (!soFar.containsKey(getKey))
                    stillNeeded.add(getKey);

            // Maybe we need to fetch some more
            Future<Map<Key, Entity>> pending = null;
            if (!stillNeeded.isEmpty())
            {......
"

As I mentioned, I was only proposing for memcache backed caching, i.e. global 
caching, so stats will be evaluated and stored only if global caching is 
enabled. In that case, since we already use memcache, I thought we can use it 
for stats storage as well. 

However, yes, i can understand we'll need some work on clients' side to handle 
the case when memcache gets killed, so, in order to simplify, i agree we can 
just save in local field, and upon every "get", update these fields. And leave 
upto the user to query the stats (ideally immediately after get) and if he 
needs any form of persistence of these stats data, let him handle it himself...

The only issue would be that if an instance gets killed due to error during a 
get, we lose the previous stats (can be handled at users' end by other forms of 
persistence).

Eventually, in practice, cache stats would not be needed in production, so we 
should keep an option to disable it (maybe disabled by default) to eliminate 
the overhead, howsoever small, in production environment.

Pls let me know your further comments, and if we agree, I'll start working...

P.S: I'm based in India, so we have a time zone difference...

Original comment by sunnymax...@gmail.com on 20 Sep 2011 at 2:43

GoogleCodeExporter commented 9 years ago

You're right, you can get cache hit rates just fine with the MemcacheService 
interface.  Unfortunately, what you're missing is the Kind for each get() since 
by the time it gets to memcache the keys have been converted to String.  This 
is necessary because memcache has some odd problems using Key for a key (see 
EvilMemcacheServiceBugTests).

Here's what I suggest:  We can modify CachingAsyncDatastoreService to pass Key 
in as a key again.  But the MemcacheService wrapper must now be smart enough to 
convert Keys to String and back again internally.  This wrapper will always 
need to be in place.

As far as where to store stats... if you store it internally in a map, we just 
need one API: how to fetch the stats.  If you store it in memcache, we need two 
APIs:  how to fetch stats, and how to turn on/off statistics.  Since the only 
meaningful value you get from stats is a percentage, and since each instance is 
a reasonable sample for the whole cluster, there doesn't seem to be much value 
in storing the stats in memcache.

These stats are really only meaningful in production, so I'd much rather have 
sampled stats that are available all the time rather than complete stats that 
require deployments and thus aren't much use for diagnosing performance 
problems in realtime.  Are there any benefits to storing this data in memcache? 
 I can't think of any.

Original comment by lhori...@gmail.com on 20 Sep 2011 at 3:55

GoogleCodeExporter commented 9 years ago

Ok, got it. Thanks for explaining in details.

Based on discussions, I think we all agree that storing stats data in memcache 
may not be a big value add, rather a significant overhead, and hence let's just 
store it in local fields in MemCacheWrapper.

Now, as far as "Stringified" keys are concerned, can we not use 
KeyFactory.stringToKey inside MemCacheWrapper to get Key, and accordingly work 
with stringified keys itself? It'll avoid any changes apart from inside 
CachingAsyncDatastoreService...

Let me know, if I'm still missing a point, sorry I've just looked a little into 
the implementation and so far have only been a user of Objectify.

Original comment by sunnymax...@gmail.com on 20 Sep 2011 at 4:38

GoogleCodeExporter commented 9 years ago

Sure you could use stringToKey but each translation is protocolbuf manipulation 
and while it's cheap compared to the RPC, it does have a real cost.  Also, if 
you're really trying to create a generic memcache interface, you have no way to 
know if a get() for a String key is really a Key or not.

My advice is actually not to bother trying to put it behind the real 
memcacheservice interface, but rather create a stripped down interface with 
just the three methods that Ofy uses - getAll(), putAll(), deleteAll().  Make 
it take Key (rather than Object) as the key param, and have it both 1) track 
stats and 2) stringify the keys.

If you really want to get fancy about layers, you can make this stripped-down 
memcacheservice interface a real interface, then separate out the stats 
tracking and the key stringification into separate objects.

Or if you want to just create any solution, I will alter it afterwards.  I'm a 
big fan of "make something that works, then make it good".  But you're on the 
right track so far!

Original comment by lhori...@gmail.com on 20 Sep 2011 at 7:10

GoogleCodeExporter commented 9 years ago

Thanks again. I think I'd rather get some skeleton worked and put it for 
review. I'm working on the same, will get back in a couple of days (no memcache 
for storing stats, and I found a simple way to handle both session and global 
cache...)

Original comment by sunnymax...@gmail.com on 21 Sep 2011 at 4:20

GoogleCodeExporter commented 9 years ago

Hello

Posted first set of code for review:

http://codereview.appspot.com/5085047

Pls post your comments and let's take it forward

Original comment by sunnymax...@gmail.com on 22 Sep 2011 at 1:46

GoogleCodeExporter commented 9 years ago

I just started tearing down and rebuilding the entire memcache system.  Since 
I'm in the code and changing internal apis, I will include this functionality 
in the process.

I see you were tracking session cache misses.  I do not plan to include this 
behavior... session cache is a bit different.  Although I can include enough 
API to make this possible.

Original comment by lhori...@gmail.com on 27 Sep 2011 at 1:36

GoogleCodeExporter commented 9 years ago

Ok, great! Looking forward to caching in next release.

Original comment by sunnymax...@gmail.com on 27 Sep 2011 at 3:41

GoogleCodeExporter commented 9 years ago

The new cache has methods for tracking stats and even a servlet to render the 
info.  Is this good enough?

Original comment by lhori...@gmail.com on 4 Oct 2011 at 3:02

Changed state: Fixed

google-code-export / objectify-appengine

Need method(s) to obtain caching statistics from objectify #107