Closed GoogleCodeExporter closed 9 years ago
I agree. Would you like to take a first pass at adding this?
Original comment by latch...@gmail.com
on 18 Sep 2011 at 4:47
Sure, I'll try something and post something for code review in a couple of
days...
Original comment by sunnymax...@gmail.com
on 18 Sep 2011 at 4:54
Awesome! We welcome the contributions. If you want, you can discuss how you
might go about implementing it on the mailing lists first and I'm sure Jeff
(and I) will have some constructive feedback.
Original comment by latch...@gmail.com
on 18 Sep 2011 at 4:56
I would start with a hook in CachingAsyncDatastoreService such that any time a
hit or a miss happens, a method gets called.
Then in your local code, derive a new service object that overrides this method
and tracks the statistics - either in memcache or in ram. Then you can 1)
create it in a derived ObjectifyFactory and 2) create a servlet (or whatever)
that renders the appropriate data.
Original comment by lhori...@gmail.com
on 18 Sep 2011 at 7:30
Thanks
I thought on similar lines, but a little different (more tightly coupled within
Objectify, but simpler for users):
1. Create a wrapper around memcacheservice used in
CachingAsyncDatastoreService, so that the memcache itself tracks the
statistics, whenever we do "get" in memcache.
2. CachingAsyncDatastoreService constructor still accepts the normal memcache,
and this wrapping is done inside the constructor, so that existing code is not
impacted.
3. In case this statistics collection has a non-trivial penalty, we can add an
Opt to disable statistics collection by memcache if caching enabled.
4. Currently, my thought is focussed only on global caching in memcache.
Now coming to actual statistics collection:
6. Stats will be stored on a per entry basis in memcache itself, since we don't
want to lose the stats due to instance shutdown...
5. Upon every get, if stats collection is enabled, it'll sort by entity type
and check hit / miss, and accordingly update the stats record
Pls let me know if it sounds good, and more importantly if I'm missing
something important, due to which we actually need a more decoupled
implementation like proposed above...
Original comment by sunnymax...@gmail.com
on 19 Sep 2011 at 2:10
What happens if memcache dies?
Original comment by latch...@gmail.com
on 19 Sep 2011 at 2:18
My thoughts:
* Creating a wrapper around memcacheservice will let you track hits, but won't let you track misses. To track hit rates, you need to know both.
* Putting the data in memcache may be overkill. All frontend instances are peers, so sampling from one instance should be a perfectly valid measurement. The main value of using memcache would be the ability to easily reset the statistics, but I'm not sure it's compelling enough to add the overhead of yet another RPC to every datastore fetch.
* If we track the hit/miss rates in internal fields, we can just run it permanently and expose a simple API that gives you hit rate as a percentage for any given Kind. If we store stats in memcache, we need to let developers enable/disable it because there is a real performance impact.
Original comment by lhori...@gmail.com
on 19 Sep 2011 at 3:11
Thanks for feedback.
If I understand correctly, a cache "get" is first done to see the items
available in cache, and for those who're not available, datastore "get" is
done. Cache hit is # items returned by cache get, and miss is total items
queried "minus" cache hit, for this particular access? Pls let me know if this
is NOT the case! Here's the code I'm referring to, from
CachingAsyncDatastoreService:
"
Map<Key, Entity> soFar = this.getFromCache(keys);
Set<Key> stillNeeded = new HashSet<Key>();
for (Key getKey: keys)
if (!soFar.containsKey(getKey))
stillNeeded.add(getKey);
// Maybe we need to fetch some more
Future<Map<Key, Entity>> pending = null;
if (!stillNeeded.isEmpty())
{......
"
As I mentioned, I was only proposing for memcache backed caching, i.e. global
caching, so stats will be evaluated and stored only if global caching is
enabled. In that case, since we already use memcache, I thought we can use it
for stats storage as well.
However, yes, i can understand we'll need some work on clients' side to handle
the case when memcache gets killed, so, in order to simplify, i agree we can
just save in local field, and upon every "get", update these fields. And leave
upto the user to query the stats (ideally immediately after get) and if he
needs any form of persistence of these stats data, let him handle it himself...
The only issue would be that if an instance gets killed due to error during a
get, we lose the previous stats (can be handled at users' end by other forms of
persistence).
Eventually, in practice, cache stats would not be needed in production, so we
should keep an option to disable it (maybe disabled by default) to eliminate
the overhead, howsoever small, in production environment.
Pls let me know your further comments, and if we agree, I'll start working...
P.S: I'm based in India, so we have a time zone difference...
Original comment by sunnymax...@gmail.com
on 20 Sep 2011 at 2:43
You're right, you can get cache hit rates just fine with the MemcacheService
interface. Unfortunately, what you're missing is the Kind for each get() since
by the time it gets to memcache the keys have been converted to String. This
is necessary because memcache has some odd problems using Key for a key (see
EvilMemcacheServiceBugTests).
Here's what I suggest: We can modify CachingAsyncDatastoreService to pass Key
in as a key again. But the MemcacheService wrapper must now be smart enough to
convert Keys to String and back again internally. This wrapper will always
need to be in place.
As far as where to store stats... if you store it internally in a map, we just
need one API: how to fetch the stats. If you store it in memcache, we need two
APIs: how to fetch stats, and how to turn on/off statistics. Since the only
meaningful value you get from stats is a percentage, and since each instance is
a reasonable sample for the whole cluster, there doesn't seem to be much value
in storing the stats in memcache.
These stats are really only meaningful in production, so I'd much rather have
sampled stats that are available all the time rather than complete stats that
require deployments and thus aren't much use for diagnosing performance
problems in realtime. Are there any benefits to storing this data in memcache?
I can't think of any.
Original comment by lhori...@gmail.com
on 20 Sep 2011 at 3:55
Ok, got it. Thanks for explaining in details.
Based on discussions, I think we all agree that storing stats data in memcache
may not be a big value add, rather a significant overhead, and hence let's just
store it in local fields in MemCacheWrapper.
Now, as far as "Stringified" keys are concerned, can we not use
KeyFactory.stringToKey inside MemCacheWrapper to get Key, and accordingly work
with stringified keys itself? It'll avoid any changes apart from inside
CachingAsyncDatastoreService...
Let me know, if I'm still missing a point, sorry I've just looked a little into
the implementation and so far have only been a user of Objectify.
Original comment by sunnymax...@gmail.com
on 20 Sep 2011 at 4:38
Sure you could use stringToKey but each translation is protocolbuf manipulation
and while it's cheap compared to the RPC, it does have a real cost. Also, if
you're really trying to create a generic memcache interface, you have no way to
know if a get() for a String key is really a Key or not.
My advice is actually not to bother trying to put it behind the real
memcacheservice interface, but rather create a stripped down interface with
just the three methods that Ofy uses - getAll(), putAll(), deleteAll(). Make
it take Key (rather than Object) as the key param, and have it both 1) track
stats and 2) stringify the keys.
If you really want to get fancy about layers, you can make this stripped-down
memcacheservice interface a real interface, then separate out the stats
tracking and the key stringification into separate objects.
Or if you want to just create any solution, I will alter it afterwards. I'm a
big fan of "make something that works, then make it good". But you're on the
right track so far!
Original comment by lhori...@gmail.com
on 20 Sep 2011 at 7:10
Thanks again. I think I'd rather get some skeleton worked and put it for
review. I'm working on the same, will get back in a couple of days (no memcache
for storing stats, and I found a simple way to handle both session and global
cache...)
Original comment by sunnymax...@gmail.com
on 21 Sep 2011 at 4:20
Hello
Posted first set of code for review:
http://codereview.appspot.com/5085047
Pls post your comments and let's take it forward
Original comment by sunnymax...@gmail.com
on 22 Sep 2011 at 1:46
I just started tearing down and rebuilding the entire memcache system. Since
I'm in the code and changing internal apis, I will include this functionality
in the process.
I see you were tracking session cache misses. I do not plan to include this
behavior... session cache is a bit different. Although I can include enough
API to make this possible.
Original comment by lhori...@gmail.com
on 27 Sep 2011 at 1:36
Ok, great! Looking forward to caching in next release.
Original comment by sunnymax...@gmail.com
on 27 Sep 2011 at 3:41
The new cache has methods for tracking stats and even a servlet to render the
info. Is this good enough?
Original comment by lhori...@gmail.com
on 4 Oct 2011 at 3:02
Original issue reported on code.google.com by
sunnymax...@gmail.com
on 18 Sep 2011 at 11:35