Open RJ opened 3 years ago
Comment written by Alecco Locco on 11/04/2008 17:43:55
Hi.
Good stuff!
I don't understand why you needed patch libevent to set manually ports, I just used evhttp_connection_set_local_address() on round robin and that took care of ephemeral port assignment. Also you can bind/connect manually and pass the file descriptor to the HTTP layer.
Something that I did need to modify/recompile is the NEVENT define, setting a maximum of 32,000 for epoll_create().
Cheers.
Alecco
Comment written by RJ on 11/04/2008 18:01:17
Alecco - did you try that with the code I posted, or something else? When I tried it, ephemeral ports were never reissued - is there some other setting you used to force it to keep issuing the same ports again?
I could have connected the socket first and passed in the file descriptor yes - that didn't occur to me. That would have meant no need to change libevent. Although, seeing as it already has a way to set the local address I figured adding a way to set the local port made sense.
Comment written by Alecco Locco on 11/04/2008 18:16:28
Hi RJ!
My client tester code is similar-ish, so far but using 127.0.0.X as source addresses (no ifconfig at this stage.) I didn't have problem with ports assigned.
I'm more stuck on the 1M connected sockets, I couldn't make Linux handle over 400k connected sockets total.
Oh, another thing I yet don't comprehend is the aio-X sysctl settings though it should not affect. But I'm keeping an eye on that too...
Have you checked your maximum reached fd number on server and client? It gets tricky to see the number of active connections and so far I track the maximum number for an fd.
Dunno if you saw my post:
http://aleccolocco.blogspot...
There is a very plain socketpair() tester to find limits on connected sockets.
Regretfully I gotta run now, but will sure try your code tonight :)
Cheers!
Alecco
Comment written by Alecco Locco on 11/04/2008 18:19:40
Edit:
"so far I track the maximum number for an fd"
->
"so far I track number of open connections with the maximum number of fd assigned"
(In *nix it is standard to assign the lowest possible fd number for any new file descriptor.)
Alecco
Comment written by iw on 11/04/2008 18:20:05
Hi Richard,
On qlc.hrl, you can use...
-include_lib("stdlib/include/qlc.hrl").
Great series,
ian
Comment written by Gunnar Kriik on 11/05/2008 00:14:33
This is probably one of the most interesting article series I've read in a decade or so. Outstanding work!
Thanks a lot for sharing this!
Comment written by Martin Tyler on 11/05/2008 01:16:53
Hi, nice work on the number of connections. As i mentioned on Aleccos blog I never tested more than 30,000 connections on Liberator as we didnt really need more - but I always wanted to try it out.
What are your message rates out to the clients? and are you testing the latency of the messages getting to the clients?
I'd be interested in the difference between the libevent version and the erlang version with regards to cpu and latency too.
Martin
Comment written by Jon Gretar on 11/05/2008 02:08:54
Hi.
First... Propably the most interesting set of Erlang blog posts I have seen so far in the universe.
Secondly... Shouldn't it be "nodes(hidden)." instead of "nodes(true)."? :)
Comment written by ral on 11/05/2008 02:12:31
instead of libevent, check out the newer libev, which has a libevent compatibility layer
Comment written by RJ on 11/05/2008 04:29:35
Martin: for the test in this post i was sending 16,666 msg/sec, and all those messages were being delivered, but I wasn't measuring latency. Anecdotally, it was near instantaneous, but I wasn't really optimizing for low-latency. I suppose if it wasn't fast enough to cope it would consume exponentially more memory and die, which never happened so far.
Jon: thanks, corrected it to nodes(hidden).
Regarding libev, I am aware of it, but libevent is better documented so it was no contest. Also, libevent is battle tested - we do gajillions of memcache requests every second, and memcached uses libevent - so it's good enough for me.
Comment written by Martin Tyler on 11/05/2008 12:12:36
RJ: I see, it was 16,666 messages 'in' and 'out' to clients? Yes, i would imagine it was fairly instantaneous at that rate or you would have seen side affects like you said.
Do you have plans to push the message rate up, or is this the kind of profile you are targetting already?
We did once have a project of this kind of profile, outside of our target audience of finance - but sadly it didnt get off the ground, otherwise i would have tested these kinds of numbers too.
Comment written by RJ on 11/05/2008 13:23:19
Martin: I suppose with finance and some other use-cases you need to be sending messages to all users at once.. I am more concerned with the social-network usage patterns for now tho - where someone creates an event and it is syndicated to a bunch of interested people. My example assumes that a million Last.fm users would connect and receive a message whenever one of their friends played a song.
Now you've mentioned it I can't help wondering how many messages per second the system could cope with :) Until you saturate the network I'd guess that it would be CPU bound - the next time I assemble the machines for another large scale test I will turn up the message rate until it falls over and see what happens.
The CPU on the mochiweb box at 16666msg/sec was around 25-30%, so I reckon it should be able to do 50k msg/sec on that hardware. Maybe even more with the libevent connection pool - something I should re-test.
Comment written by Martin Tyler on 11/05/2008 16:30:30
RJ: Is your current test assuming everyone only has 1 friend? :)
Liberator can certainly saturate a gigabit network before cpu being a problem - with some usage profiles that is.
With 10,000 clients Liberator can send 100 messages/sec to each of them (from a 'backend' update rate of 20,000 messages/sec). The messages to the client are 58 bytes, which all adds up to about half of a gigabit network. So someone gets a big bandwidth bill. This is why message size is so important for comet applications, it all adds up very quickly
Comment written by Thijs (Shenzhen) on 11/05/2008 16:40:40
Great article, especially your usage of libevent is an eye-opener for me. Also good tip to spawn a separate process to prevent blocking the gen_server. I'm going to apply this tomorrow to my own Erlang server as well.
Thanks again for this article; I've been coming to your site almost every day to see when you would publish part 3 :)
Comment written by Paulo Almeida on 11/05/2008 17:22:25
RJ: have you tried specifying small send and receive buffers using {sndbuf, ...} and {recbuf, ...} options in gen_tcp:connect?
I would be curious to see what can be done in a pure Erlang solution without resorting to C.
Regards,
Paulo
Comment written by RJ on 11/05/2008 17:52:31
Martin: the igraph command i used generated an average of 15 friends per user, for user ids 1 to 1M. It fits a sensible model tho, so many people have 1 friend and fewer people have lots (far more than 15 in some cases).
It sounds like liberator is optimised for a different use case than my attempt - there's no way in a social-network like environment you'd want 100 messages per second about what your friends are doing ;) Messages in this test were also fairly small, although in a production scenario I could imagine sending much bigger messages, possibly HTML fragments of JSON data to be rendered by the client.
Paulo: I've not tried that, but thanks for the tip. Looks like I need to do a Part 4 and try some optimisation and high-throughput tests.
Thanks to everyone who commented so far, I appreciate the feedback.
Comment written by Martin Tyler on 11/05/2008 19:08:09
RJ: Ok, so i guess what i am asking is whether the 16,666 messages is the number of times an 'i am playing a song' message is created, or the number of times it is received. If you have 15 friends on average are you producing 250,000 messages or does the 16,666 already represent that?
It's bad enough the number of my friends that update their facebook status daily.. so 100 times/sec would definitely be overkill for this kind of application :) It'd be interested to see how well Liberator coped with your usage profile though, so keep posting your test figures and i might run some tests myself if i get some spare time.
Comment written by RJ on 11/05/2008 21:24:36
Martin: Ah I see - In this test, the 16,666 is the total of number "i'm playing something" msgs being created by the system. Given that all possible users were connected, each messages would have been delivered to 15 people on average, so there should have been 250,000 messages a second actually delivered to a client.
Comment written by Martin Tyler on 11/05/2008 23:27:57
RJ: That's a decent number of messages then. Liberator, and some other Comet servers aimed at high updates per user have various tricks to improve the performance, eg batching of messages to make better use of the network. With your kind of usage profile there is probably less you can do along those lines though.
Comment written by Alecco on 11/06/2008 03:11:48
Hi RJ.
I take back the NEVENTS bit mentioned in my comment, missed completely somehow the epoll_recalc() bit of libevent and it seems I got confused with another limit.
But the other stuff is valid. I'll post soon about this with my findings.
This is very cool work here. Wish you mentioned you did a libevent HTTP server on your first post... I thought you implied you went another way.
Cheers.
Alecco
Comment written by RJ on 11/06/2008 16:21:41
Alecco: cool I look forward to reading your next post.
Truth is I didn't have any plans to try libevent until I had to use it for the client. By then I figured it might as well try a server too :)
Comment written by Alecco Locco on 11/13/2008 05:26:57
RJ, you were right. Again. Do'h! :)
Here is an analysis on this issue and a different approach to get around it with libevent:
Comment written by Pedram on 11/17/2008 08:49:18
I have to agree, this is definitely the most interesting erlang discussion so far, by the way I've implemented the consistent hashing and think that it's very nice. I think
Scalaris is trying consistent hashing for distributed key/value databases... interesting stuff.
Comment written by Carl Youngblood on 02/12/2009 00:41:45
This is a very interesting and impressive series of articles. Thanks so much for making them available. Being fairly new to Comet, I'm trying to wrap my brain around how I could make the client be a web browser accessing a regular HTML page. I'm trying to build a prototype for a financial app that runs in the browser and receives real-time updates from the server. I think comet would be a great way to go. Granted, the memory usage of each client would be much higher, but if each user is running the client on a separate computer, this is not a problem. Of course, it is more difficult to test.
Can you share any thoughts on how you would go about connecting to this server from a web client?
Comment written by Anonymous on 04/18/2009 14:02:00
The httpdcnode.c doesn't seem to work. I played around with it a little bit and it seems you can't queue data to be written to a socket from the cnode_run thread when socket polling is handled by another thread. Also this email seems to confirm this - http://monkeymail.org/archi...
My testing platform was libevent 1.4.9 and Darwin kernel (kqueue, kevent), so it's possible it somehow worked (but shouldn't) in Linux with epoll. Would be interesting to hear how you solved these problems in real world (pipes used to wake up the main thread and mutexes?).
Comment written by Carlos on 06/07/2009 00:54:48
I love this series of articles! I've been re-reading them periodically the last months. They are really useful.
Since you've been talking about a Erlang memcached client in this last part of the series, I would like to share with you -and with your readers- the early implementation of a libmemcached wrapper for Erlang I've just published in Google Code. I suppose it could be useful for someone.
Comment written by roberto on 08/04/2009 08:35:39
same here as anonymous of april 18th.
tried it out, however httpdcnode.c doesn’t seem to work, since nothing is written in the request socket as output.
Comment written by RJ on 10/07/2009 16:03:30
I don't have access to my test rig anymore, so I won't be collecting any more data for the time being.
I'm RJ2 in #erlang on irc.freenode.org if you want to say hi or ask about how anything from these articles works.
Comment written by RJ on 12/11/2009 11:27:23
BTW regarding the C httpd code, rather than make it a cnode and use a thread, you can make it an erlang port that communicates over stdin/stdout.
You should be able to use libevent to watch stdin for msgs to send, and thus do it all in a single thread. See this post for more details: http://blog.socklabs.com/20...
Comment written by Mihai Rotaru on 11/17/2010 17:41:09
Hi,
You can see how Migratory Push Server achieves real-time streaming to 1 million concurrent users with 17 milliseconds end-to-end mean latency on a small server (1U Dell SC1435 2 x dual-core @2GHz and 16 GB memory):
http://migratory.ro/data/Mi...
Another interesting use-case we present in this benchmark document is the ability of the Migratory Push Server to scale near to 1 Gbps real-time data publication with end-to-end mean latency of 7 milliseconds.
Mihai
Comment written by Sakith on 02/18/2011 06:57:02
This was very helpful!
Comment written by escorte on 03/20/2012 16:23:05
I want to thankx for the efforts you have made in writing this post. I am hoping the same high-grade post from you in the future as well. In fact your creative writing skill has inspired me to get my own blog now. Really the blogging is spreading its wings quickly. Your write up is a fine model of it.
Comment written by Hayg on 05/24/2012 18:18:37
Hi RJ,
Just wanted to stop by and say thank you for the great post! It was really informative and an interesting read.
Thanks,
Hayg
Comment written by Kuba Ober on 12/10/2012 13:45:05
This almost begs of opening up a raw socket on the network interface and implementing TCP/IP on the Erlang side of things. It'd get rid of the 10GB kernel overhead and the need for multiple network interfaces. One thing I'm not sure of is how well would Erlang's scheduler cope with one very, very busy process that receives all those raw packets.
Comment written by ideawu on 04/12/2014 04:18:14
Hi, you MUST call evhttp_send_reply_end() on a request, if you previously called evhttp_send_reply_start() on it, or libevent will NOT free the memory until you free event_base. So, a call to evhttp_send_reply_end() must be provided in cleanup() function!
Comment written by Sev on 07/25/2016 03:36:07
Wow, these three articles taught me a lot and had the highest awesome-engineering to word-count ratio I've seen in a long time.
Written on 11/04/2008 16:49:05
URL: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3