KerwinMa / memcached

Automatically exported from code.google.com/p/memcached
0 stars 0 forks source link

Instrumenting memcached with sFlow #157

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
This patch is a proposed enhancement against the central repo that adds sFlow 
monitoring.

This approach addresses the same problem-space as ticket 109:
http://code.google.com/p/memcached/issues/detail?id=109
However with sFlow all the analysis is moved off the server, leaving only the 
essential minimum.  Alternative approaches such as tcpdump, systemtap and 
dtrace do not scale well in a production environment.

For background on this, see:
http://blog.sflow.com/2010/09/memcached.html

If you compile with this patch,  then you can run with "memcached -u
nobody -o sflow=on",  and it will pick up configuration from /etc/
hsflowd.auto, such as:

sampling=400
polling=20
agentIP=10.0.0.112
collector=10.0.0.111 6343

"agentIP" should be the IP of the server,  and "collector" should be
the IP of an sflow collector such as "sflowtool":
http://www.inmon.com/bin/sflowtool-3.17.tar.gz

(This config file is generated automatically if you install hsflowd. hsflowd is 
the host sFlow daemon from host-sflow.sourceforge.net which
contributes an sFlow feed of server performance stats.)

Neil

NOTES:

(1).  Provided the sampling-rate is set appropriately, the overhead
should be roughly equivalent to adding one extra stats counter.  The
critical path is just a decrement-and-test on a per-thread counter (no
locking).

(2).  The changes are all within "#ifdef SFLOW" except for (3) below.

(3).  I changed memcached.h:struct conn->request_addr to be of type
struct sockaddr_storage so it would work for IPv6 too. 
What version of the product are you using? On what operating system?

Original issue reported on code.google.com by neil.mck...@gmail.com on 27 Sep 2010 at 4:49

Attachments:

GoogleCodeExporter commented 9 years ago
It would be nice if you could publish the patch through a git tree.. that would 
make it easier for us to review. 

(In addition I think you should add a configure option to enable / disable the 
feature instead of forcing the users to manually edit Makefile.am)

Original comment by trond.no...@gmail.com on 28 Sep 2010 at 2:26

GoogleCodeExporter commented 9 years ago
Done.  See http://github.com/sflow/memcached

./configure --enable-sflow

Thanks,
Neil

Original comment by neil.mck...@gmail.com on 28 Sep 2010 at 5:47

GoogleCodeExporter commented 9 years ago
Hello Folks,

is this patch going to be added any time soon?

Regards, Stefan.

Original comment by s.schles...@ixolit.com on 12 Jul 2011 at 10:05

GoogleCodeExporter commented 9 years ago
I really think this can exist as a plugin without having to modify the source 
tree.  I'd rather not drop all of this in directly if it can be avoided.

Original comment by dsalli...@gmail.com on 12 Jul 2011 at 11:12

GoogleCodeExporter commented 9 years ago
Any news here? It would be very interesting to integrate official memcached 
with sflow. 
Thank you very much.

Original comment by Rosso.Gi...@gmail.com on 26 Jun 2012 at 8:16

GoogleCodeExporter commented 9 years ago
FYI,  there is a talk on this at Velocity tomorrow:
http://velocityconf.com/velocity2012/public/schedule/detail/23487

If you would like to try memcached+sflow,  please just let me know the 
particular version of memcached that you want to run,  and I can make sure 
there is an corresponding build with the sFlow feature.   It's not hard to add, 
 and most users stick with the same version for a while,  so I think it's quite 
sustainable to do it this way for now(?)   Perhaps eventually the memcached 
experts will take it on,  but in the mean time more users can only help.

Neil

Original comment by neil.mck...@gmail.com on 26 Jun 2012 at 9:20

GoogleCodeExporter commented 9 years ago
@Neil
Thanks for your answer, we are using last version, 1.4.10 has problems with big 
slabs (we use 'em). So if you could just give as a patch or a full source for 
1.4.13 I would be very glad.
Thanks.

PS Sorry for the "personal post", but I didn't how to contact you directly.

Original comment by Rosso.Gi...@gmail.com on 29 Jun 2012 at 9:05

GoogleCodeExporter commented 9 years ago
I pulled in the 1.4.13 changes (with no conflicts), and posted a download 
tarball on this page:

https://github.com/sflow/memcached/downloads

sFlow is enabled by default.  So the steps are just:

tar xvzf memcached-1.4.13-sflow.tar.gz
cd memcached-1.4.13-sflow
./autogen.sh
./configure
make
make install
memcached -u nobody

It will pick up the config that is shared by hsflowd (via /etc/hsflowd.auto).
http://host-sflow.sourceforge.net

Please let me know how it goes:  neil.mckee.ca@gmail.com

Neil

Original comment by neil.mck...@gmail.com on 29 Jun 2012 at 8:16

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Testing the water again....

If I were to strip out all the unnecessary sflow-library fluff from this patch 
it might boil down to less than 1000 lines of code (in sflow.h and sflow.c with 
just a few macros in memcached.c).  That would make it easier to 
vet/critique/improve -- and perhaps ultimately sign-off on?   If there's an 
even chance then I'll do the work.  Let me know.

Neil

Original comment by neil.mck...@gmail.com on 2 Sep 2013 at 6:19

GoogleCodeExporter commented 9 years ago
I've been using 1.4.13 with sflow for almost an year now, and looks fine. What 
are the problem for sflow inclusion (@Neil)? 
Thank you.

Original comment by Rosso.Gi...@gmail.com on 1 Oct 2013 at 8:13

GoogleCodeExporter commented 9 years ago
@Rosso

I have no new information to share.  I recently offered to shrink the sFlow 
patch down to bare essentials,   but got no reply yet.
I suspect the committers are focused on stability,  and so anything that adds 
new code is on the back burner.   I recently offered
to shrink the sFlow patch down in case that helps,   but got no reply yet.

So in the mean time,  if you need the sFlow feature ported to 1.4.15 then let 
me know.

---000---

As an aside,  the version you are using also contains an experimental "stats 
htwalk" command:

stats htwalk <start-offset> <N> <max-bytes>

It simply walks along the top of the hash table buckets pulling out up to N 
associations and sending
back some details on them:

<record-number> <bucket-number> <secs-since-set> <secs-to-expire> <value-bytes> 
<key>

The value of this measurement is that you can run it against one node and it 
approximates a random sample of the
keys in the entire cluster.   It therefore complements the sFlow transaction 
sampling:  while the sFlow sampling can
tell you the hot keys and missed keys that are active now,  this measurement 
allows you to survey the "dark matter"
of keys that might be occupying the cache even if they are not being actively 
get or set.  The measurement is light
enough that you could run it in production with no impact,  and repeat it 
several times to different nodes or with
different values of <start-offset> to get a larger N.

The <max-bytes> is limited to 2000000,  and there are no filtering options.   
That puts an upper bound on the
number of cycles it can possibly take.

Does this sound useful?  If you want to try it then please test it thoroughly 
in the lab first.  I haven't looked at it
in a while and  I'm not confident that I got the mutex locking right.   All 
possible disclaimers,  etc.

Neil

Original comment by neil.mck...@gmail.com on 11 Oct 2013 at 5:53