Closed GoogleCodeExporter closed 9 years ago
As an alternative, including sFlow instrumentation has minimal overhead
(approximately the cost of adding one more performance counter), but provides
full details about keys and operations allowing top keys, missed keys etc. to
be monitored.
sFlow reduces the overhead on the Memcached server by exporting a random sample
of memcache operations to a remote collector. The collector receives samples
from every server in the cluster and calculates top keys etc. This architecture
is extremely scalable and flexible. It is only the collector that needs to be
modified in order to calculate additional statistics (like top keys with a
particular prefix and operation etc). Since sFlow is also used to monitor
network traffic and server performance, the sFlow collector can put the
information together to provide a comprehensive view of cluster performance.
The patch needed to add sFlow to Memcached is in ticket and could easily be
ported to the 1.6 branch:
http://code.google.com/p/memcached/issues/detail?id=157
For more information on how sFlow works, and the type of data you can get, see:
http://blog.sflow.com/search/label/Memcache
Original comment by peter.ph...@gmail.com
on 15 Apr 2011 at 2:19
Neil came by at the last hackathon, and I think we talked through how an engine
similar to bucket engine[1] could be implemented to provide the sFlow
extensibility. I don't know what the current thoughts are on issue 157 but I
think we were suggesting the engine as the best path.
This whole thing is different than TOP_KEYS. TOP_KEYS works with existing
protocol and clients without needing extra or external tools.
[1] https://github.com/membase/bucket_engine
Original comment by ingen...@gmail.com
on 15 Apr 2011 at 6:05
It's worth thinking about why measurements like TOP_KEYS are important. This
type of measurement is there to help improve performance. If the TOP_KEYS
function kills performance by 50%, it's hard to see the justification for
turning it on since the feature provides limited data and is difficult to
manage in an operational setting (it can't easily be enabled, disabled or
reconfigured).
It is convenient to have the measurements calculated by the server and made
available through the memcache protocol, but that convenience comes at a huge
cost. Shifting the analysis away from the servers means that you get a great
deal more flexibility, with minimal overhead on the server. For example, in
addition to reporting TOP_KEYS, you can analyze sFlow data to report on top
missed keys - very helpful for improving cache hit rates. Calculating
additional metrics using sFlow involves no additional work on the servers,
whereas each time you add an additional metric like TOP_KEYS on the server you
cut performance by an additional 50%.
In any case, it is likely that you would use an external application to analyze
the performance metrics and produce charts and reports regardless of whether
memcache or sFlow is used to transport the metrics.
What is the overhead of inserting an extra engine in the chain? My concern
would be that the cost of adding the instrumentation as a module might be high
- reducing the value of the instrumentation. The optimal location for sFlow
would be in the protocol engine where the counters are updated, since the sFlow
hook in the performance path essentially involves maintaining one additional
counter.
Original comment by peter.ph...@gmail.com
on 15 Apr 2011 at 7:25
Hi all,
Yes, I showed up for the hackathon but was too lazy to stay all night and
actually do the work :)
I guess I was hesitating because it wasn't clear if anyone was going to try it.
I didn't want to write an sFlow engine-shim just to commit it to the void.
If there is a real desire to see this problem solved, and there is consensus
that sFlow's "random-sampling with immediate forwarding" approach is the best
way to do it, then I'm happy to go ahead. It certainly seems like there is
now a clearer understanding of the need to do this without impacting
performance, so perhaps the time is right?
So to summarize, the questions are:
(1) "If I write this will you test it?" and
(2) "if it works great, will you bundle it with the default download?"
It may help to know that there are a number of freeware tools out there that
can receive and process sFlow in various ways, and there are also a number of
other sFlow agents that are also free and open-source. Here are some examples:
http://ganglia.sourceforge.net
http://mod-sflow.googlecode.com
http://host-sflow.sourceforge.net
http://www.inmon.com/technology/sflowTools.php
(and of course there is also overwhelming support for this approach on the
network equipment side:
http://sflow.org/products/network.php)
Anticipating where this may lead, I think the big carrot is that down the line
you may find you can remove the top-keys code from the default engine and clean
up the critical path a little.
Thoughts?
Neil
Original comment by neil.mck...@gmail.com
on 15 Apr 2011 at 8:51
It's hard to predict what the adoption will be. Having something entirely
optional such that the code isn't even the binary unless someone is interested
in it is great, though.
Original comment by dsalli...@gmail.com
on 16 Apr 2011 at 1:11
Having sflow support would be nice (probably pretty great, even), but as an
option to folks who want sflow. I think it's orthogonal to this thing.
topkeys has the usability right, but the math wrong.
sflow has the math (closer), but the usability wrong (for this particular need
we have).
I imagined a similar feature a few years ago after talking with a former NDB
guy and looking at varnish, which would somewhat intrinsically allow sflow and
avoid tcpdump... It's stupid to describe it here, but in short it's that
pattern where you write logs into a ringbuffer and allow listeners to stream
the logs (+ a runtime tunable sampling ratio). So "topkeys" ends up being
closer to what varnishlog/varnishhist/varnishtop are. I love those utilities so
much I wanted them for memcached, but not enough to actually write the feature
myself... yet...
So all I can do is grandstand at someone who's already written a feature that's
partway there? Please understand that I do feel pretty stupid pushing back on
this already working thing. However, it smells like a customer feature and we
*know* there are better ways of doing this. Is the only reason to keep it
exactly the way it is because it's already done and you have customers who rely
on it?
Remember it's very very hard to change something like this after the fact. I'd
rather be in varnish's position at the end of the day.
Original comment by dorma...@rydia.net
on 16 Apr 2011 at 7:19
If I have understood correctly, I think you might want both features. A
ring-buffer than clients can connect to and stream back using server-side
filtering and sampling might be great for troubleshooting a single node.
However the sFlow feature is aimed more at continuous operational monitoring of
a whole cluster. If you have 1000 nodes in your cluster you wouldn't really
want to open 1000 TCP connections from the monitoring client, so UDP logging
makes more sense (especially if you are sampling anyway - lost packets are just
an unintended adjustment to the sampling rate). With UDP logging it's more
natural to pack the fields into a structure than to send a delimited ASCII
string, so that's how you end up with the XDR encoding that sFlow uses. It's
easy enough to turn it back into an ASCII stream at the client if you want to.
That's what the "sflowtool" freeware does. I guess you could think of the UDP
as an efficient mechanism to multiplex all the samples together from all the
nodes.
I'm persuaded that there is at least some curiosity here, so I'll have a go
at adding the engine shim. (Just finishing a module for nginx first...).
Dustin persuaded me that the engine shim will only add one indirect function
call to the critical path. That's a lot more cycles than the "if(--skip==0)"
we had before, but my guess is that it still won't noticeably impact
performance(?) Perhaps we can revisit that in due course.
Original comment by neil.mck...@gmail.com
on 17 Apr 2011 at 9:29
OK! When you have a moment please try this:
https://github.com/sflow-nhm/memcached
./configure --enable-sflow
I forked from the "engine" branch, and added sFlow support. There is no
additional locking in the critical path, so this should have almost no impact
on performance provided the sampling 1-in-N is chosen sensibly. (Please test
and confirm!)
In addition to cluster-wide top-keys, missed-keys etc. you also get
microsecond-resolution response-time measurements; the value-size in bytes for
each sampled operation and the layer-4 socket. So the sFlow collector may
choose to correlate results by client IP/subnet/country as well as by cluster
node or any function of the sampled keys.
The logic is best described by the daemon/sflow_mc.h file where the steps are
captured as macros so that they can be inserted in the right places in
memcached.c with minimal source-code footprint. The sflow_sample_test() fn is
called at the beginning of each ascii or binary operation, and it tosses a
coin to decide whether to sample that operation. If so, it just records the
start time. At the end of the transaction, if the start_time was set then the
sFlow sample is encoded and submitted to be sent out.
To configure for 1-in-5000, edit /etc/hsflowd.auto to look like this:
rev_start=1
polling=30
sampling.memcache=5000
agentIP=10.211.55.4
collector=127.0.0.1 6343
rev_end=1
Inserting correct IP address for agentIP.
If you compile and run "sflowtool" from the sources, you should see the ascii
output:
http://www.inmon.com/technology/sflowTools.php
For more background and a simple example, see here:
http://blog.sflow.com/2010/10/memcached-missed-keys.html
The periodic sFlow counter-export is not working yet (that's what the
polling=30 setting is for). I think the default-engine needs to implement the
.get_stats_block API call before that will work. Let me know if you want me to
try adding that.
Best Regards,
Neil
P.S. I did try to do this as an engine-shim, but the engine protocol is
really a different, internal, protocol. There was not a 1:1 correspondence
with the standard memcached operations.
Original comment by neil.mck...@gmail.com
on 20 May 2011 at 2:43
Trond's pulled this from engine-pu. We'll do it better/externaler next time.
Original comment by dsalli...@gmail.com
on 28 Sep 2011 at 8:16
Original issue reported on code.google.com by
dorma...@rydia.net
on 15 Apr 2011 at 7:23