Orleans sockets, would there be need for more performance?

veikkoeeva commented 9 years ago

Out of curiosity I took a look at SocketManager and related and I was wondering has anyone considered either SocketEventAsyncArgs or things like RIO (if this is a good source)/RDMA?

Long time ago, during the .NET 3.5 era (before async, that is) and when Windows Server 2003 was good stuff, I implemented a high-performance soft-realtime socket server and got much better performance with SocketAsyncEventArgs. I suppose one factor here is how much of churn there is in senders (e.g. new connections) and world has moved on a bit with the new async interfaces.

Nevertheless, I was thinking some computing scenarios and things like high-performance trading (those guys strip down Linux kernels or have moved to FPGA already) where not so long ago this was quite hot. Some problems I once upon time dodgde are nicely enumerated at Long running async and memory fragmentation (of RavenDb fame, see especially the discussion) and I noticed Cassandra is using SocketEventAsyncArgs too. As a final note for those curious Awaiting Socket Operations.

Now that I got this far, I wonder does anyone have use cases for maximum socket performance or would it be nice to have features like these (as in closer to metal and less memory pinning)? Inter-datacenter messaging perhaps? Some inter-silo computing transferring a lot of data?

gabikliot commented 9 years ago

@veikkoeeva , @jason-bragg and myself are looking into that same area just now! Looking to improve messaging throughput. So your post is very timely! The messaging perf. is pretty good even now (for what we consider good), but of course we want more. For large messages (10K payload and more) we could easily saturate the network bandwidth. The problem is with small messages, which is majority of Orleans apps - short small interactions between actors. For small messages there are overheads all over: 1) It is well known that all OSes in general are not excellent with small messages - there are per system call overhead, ... 2) Inside Orleans runtime on the hot path we of course have a bunch of per messages overheads - headers, put into queue, remove from queue, all kind of tracking, interaction with TPL (Task per request, await,...)

We are looking now into optimizing some of those.

We already now receive from sockets with async Receive, but send is still a sync send. So we are looking now into async send and SocketEventAsyncArgs . We are not seriously thinking about RDMA yet. I thought it's not a commodity on Azure yet.

Thanks for all the links! Very useful stuff!

veikkoeeva commented 9 years ago

@gabikliot, @jason-bragg This was nice to read. :) I'd like ask, if all at possible, it wouldn't hurt to make the API so the channel could be encrypted if so desired. I don't have a concrete customer case using Orleans for this, but it wouldn't hurt. :) A smaller wish, a tiny-weeny one, a bit of a abstraction too in case someone would like to make a go with RIO -- or even RDMA (you never know).

As for general interest on RIO:

Netmap, RIO and the challenges in using a 10 Gigabit pipe (look at the links and the nice performance figures).
A related SO link to previous one.
A request for a managed wrapper for RIO in dotnet/corefx.

@gabikliot You may have read, but just in case BufferBloat: What's Wrong with the Internet?.

<edit: I got carried away and forgot to add this here (perhaps of general interest and so forth) RecycableMemoryStream. Perhaps allocating byte[] from here used in socket operations would help. It would look like doable, but I haven't actually tried myself. It would look like to be usable in Orleans as there are memory buffers.

gabikliot commented 9 years ago

@veikkoeeva , thanks! 1) RIO is only on Windows, not in .NET - http://stackoverflow.com/questions/18419117/rio-registered-i-o-sockets-in-net, so for now off the topic. 2) RDMA – same deal as far as I can tell - special hardware, not in .NET yet. 3) SocketAsyncEventArgs is definitely the way to go: http://stackoverflow.com/questions/3442344/c-sharp-high-performance-server. That is what we will likely prototype soon. 4) we know about RecycableMemoryStream. It's good stuff. We have our own BufferPool, which we wrote years before RecycableMemoryStream existed. It would be interesting to try to move to this one, but not sure about priorities. 5) Overlapped I/O - https://msdn.microsoft.com/en-us/library/system.net.sockets.socket.useonlyoverlappedio(v=vs.110).aspx - do you know much about it? Is it any different from using SocketAsyncEventArg. I wonder if/how we can use that as well. 6) http://ayende.com/blog/170243/long-running-async-and-memory-fragmentation - this link is awesome, GC, pinning, the bug fix in 4.5.2. Fun!

veikkoeeva commented 9 years ago

@gabikliot

5) Overlapped I/O - https://msdn.microsoft.com/en-us/library/system.net.sockets.socket.useonlyoverlappedio(v=vs.110).aspx - do you know much about it? Is it any different from using SocketAsyncEventArg. I wonder if/how we can use that as well.

I believe that is what the Async methods are doing underneath, also classes like TcpListener. This is also how SocketAsyncEventArgs operates

In the new System.Net.Sockets.Socket class enhancements, asynchronous socket operations are described by reusable SocketAsyncEventArgs objects allocated and maintained by the application. High-performance socket applications know best the amount of overlapped socket operations that must be sustained. The application can create as many of the SocketAsyncEventArgs objects that it needs. For example, if a server application needs to have 15 socket accept operations outstanding at all times to support incoming client connection rates, it can allocate 15 reusable SocketAsyncEventArgs objects for that purpose.

There's a code example that much resembles what I did sometime around 2005 (if my memory serves me well enough). During that day there weren't concurrent collections, so I wrote a Queue with spinlocks on both ends to what that stack there is doing. Queue so that I could avoid a "hot head" when both reserving and releasing buffer objects. It would feel like RecyclableMemoryStream could be used nowadays, but this is just a hunch.

About the other libraries, you are absolutely correct. The SSL option would be nice, and without much looking at it, doesn't look like complicating the story too much. In B2B end-to-end encryption is a good selling point. :)

jason-bragg commented 9 years ago

Anyone interested in this thread, please see PR: "Modified network receive to perform bulk receive. #475"

This is the first major network optimization we've introduced since going open source. It does not optimize the actual network calls (by using SocketAsyncEventArgs, for instance), instead relying on simply reducing the number of network receive calls being made, under the philosophy that you can't get any faster than work not done.

Feedback is very welcome.

jbragg

veikkoeeva commented 9 years ago

Guys, this is very interesting: https://github.com/benaadams/benchmarks/tree/managed-rio-experimental/experimental/ManagedRIOHttpServer. See also at http://www.ageofascent.com/azure-cloud-high-speed-networking/.

gabikliot commented 9 years ago

Great info @veikkoeeva !

Aaronontheweb commented 9 years ago

One option worth considering is DotNetty: https://github.com/Azure/DotNetty

It's a .NET-based port of Java's Netty framework. It's still early yet but being actively developed.

pherbel commented 9 years ago

Helios also could be an option. It is used by Akka.Net https://github.com/helios-io

grahamehorner commented 9 years ago

would there be any mileage in using libuv for Orleans given the move towards cross platform compatibility; in a similar way to asp.net vnext kestrel

gabikliot commented 9 years ago

looks like libuv is C. How would you use it in C#?

attilah commented 9 years ago

@gabikliot I think Kestrel (Asp.Net vNext host) is based on libuv so check the Asp.Net vNext source code or ask the team :D they've solved it already.

attilah commented 9 years ago

@veikkoeeva Regarding Cassandra and network optimization check this out: http://www.zdnet.com/article/kvm-creators-open-source-fast-cassandra-drop-in-replacement-scylla/

They've just touched this networking painpoint in their solution I think.

veikkoeeva commented 8 years ago

In the following I present observations, which I hope increase stability and all allow us to squeeze more performance of the socket implementation. The observations are based on how .NET memory allocator, garbage collection and sockets function. A quick primer good for working knowledge on .NET memory allocator and garbage collection is Ben Watson's presentation Lessons in Extreme .NET Performance (starting at about 5:30 minutes, GC specifically at 9:00 minutes). A similar but written exposition on memory allocation and GC is accessible at Tess Fernandez's How does the GC work and what are the sizes of the different generations?.

The main points are that memory is allocated in managed heaps. Initially one heap contains two segments, the first segment (or the another) for both gen0 and gen1 called the ephemeral segment and the second (or the another one) to gen2. The size of gen0 and gen1 can never exceed the size of this one segment, but gen2 can grow without bounds. When an ephemeral segments fills up, a new segment will be allocated and it becomes the new ephemeral segment. The size of segments vary, currently documented values can be seen at Fundamentals of Garbage Collection (and specifically in the section Generations). An allocation is a pointer increment inside a segment, so likely very fast.

The main highlight is from Ben's presentation that one wants objects to be collected in gen0 or live a long time in gen2, i.e. if it is not sure objects go away in gen0, pool them (reasons explored in the presentation). As a corollary, pinning interferes with the GC so that pinned objects cannot be moved and consequently the ephemeral segments in gen0 and gen1 cannot be compacted efficiently. The consequence is both heap fragmentation and increased memory consumption.

How does affect Orleans?

For memory allocation

Looking at the source code there is a new buffer created and pinned twice for every grain read at IncomingMessageAcceptor. First is a read of sizeof(int) bytes to interpret the size of grain ID, which follows as a next read. The next allocation is the amount of data by first allocation. It is reasonable that during some undefined but high-enough load, there will be pinned buffers evenly distributed on gen0 and possibly also in gen1 and gen2. It looks reasonable to think the distribution interval is roughly relative to the size of parameters as well as other silo code that gets allocated in the ephemeral range (and might do pinning). It looks like the same happens for sending at SocketManager. Sending of the actual messages uses BufferPool, which by default initializes 10000 pieces 4 KiB segments. Depending on how load is generated, these segments will eventually be compacted (if needed, they are allocated one after another) and promoted to gen2. In the mean time they can be pinned too. Depending on messages sent and how the application code behaves, it looks plausible to think pinning can cause fragmentation and introduce instability to the code. It looks possible this behavior comes more visible when messages are larger so that a larger contiguous space needs to be found when serializing the message from the segments or by some other application code. The frequency of messages has a role here too, since both sending and receiving them pins buffers. An observation that warrants a note is that Buffer.BlockCopy pins the memory objects too since it calls the underlying native CRT library.

For CPU

It looks reasonable to assume the memory operations show mainly on CPU used by the GC. There is another point in that the socket code uses Begin* variety of functions, which allocate a IAsyncResult, OverlappedAsyncResult and perhaps a WaitHandle too (see here, tangentially). Some operations use the synchronous variety of these function, but it might be more efficient in the current arrangement. I'm not sure the effect of this since I'm not sure how much these structures are being used.

Mitigations

Memory

The socket code could reserve as a pool a large contiguous array from LOH straight when the application starts and divide that into segments. This removes heap pressure from the ephemeral heap and also removes cycles used in GC. This also would remove pinning effects since buffer is already one contiguous array in LOH for which the GC doesn't need to work. It would stand to reason should there be OutOfMemoryException, it will never be because of Orleans socket code. Also, the heap would remain stable with regard to these operations. It would seem natural to introduce separate buffers for separate sized objects. For instance, three separate pools

For the initial message size read
The grain ID read
The payload read

It might be reasonable to create more than one pool for payload reading. This would be cumbersome, so taking a dependency on RecyclableMemoryStream might make sense.

CPU

One could switch the code to use the *Async variety of functions. The benefit would be that they would allocate less auxiliary objects and would reuse buffers with the operations. This might also make the code more concise. More conciseness could be introduced as exemplified by Stephen Toub at Awaiting Socket Operations.

CPU and memory

What if one wants to pass a large parameter? Say, a byte array? It looks now memory would be allocated as long as the parameter gets wholly reconstructed before it is transmitted to application code. Would it make sense to introduce a Stream type the application could start decoding immediately, which would free buffer space and perhaps a IOCP thread in socket code?

Follow-ups

I wonder personally if we should set up guidelines for performance sensitive parts as per Ben's presentation Lessons in Extreme .NET Performance? In the presentation he mentions also that a single line of LINQ can take up to 400 ms to JIT and produce far more IL than some other constructs. This is an area that might bear some relevance both on CPU performance and on memory pressure (i.e. there are LINQ optimizers, also here) especially when the size of the processed collection is known.

I also wonder if it makes sense to take dependency on RecyclableMemoryStream, are there other places where it could be used?

I also wonder if we should wait a bit with the outsourcing idea and if these ideas are good and implementable, beef up the current one. During this time a solution might present itself.

Some managed heap fragmentation pointers and TCP server pointers I managed to find

jason-bragg commented 8 years ago

The referenced read logic in IncomingMessageAcceptor is only used during initial handshake for new connections, it is not the primary read loop. The primary receive logic between silos is in ReceiveCallbackContext.ProcessReceivedBuffer. The primary receive logic between silo and client is in GatewayClientReceiver.RunNonBatch. Both perform bulk reads into buffers of blocks from the block pool and then process the messages from the buffer. This logic is in IncomingMessageBuffer.TryDecodeMessage.

veikkoeeva commented 8 years ago

@jason-bragg All good points and I stand corrected, especially if that drops away those "two reads", which I thought could be problematic. I'm not sure who knows the code well enough to draw the conclusions and see if there is beneficial work to be done here and record them as issues. You are the one I know, maybe @gabikliot. I don't know much about that particular piece of code (as you corrected already), but it seemed to be in order to write in more detail what I had in mind on the quick-pass debug on Gitter the other day.

jason-bragg commented 8 years ago

I don't know much about that particular piece of code

I provided the networking receive details to aid in your investigation more than to correct the record.

I'm not sure who knows the code well enough to draw the conclusions

@gabikliot and I are probably the most familiar with the networking layer, but for my part, I don't know it well enough to completely rule out pinning as an issue. I don't suspect that it is, but can't rule it out.

It seemed to be in order to write in more detail what I had in mind

I thank you for doing so. If you are of the opinion that pinning may be an issue and have the time to investigate, please don't let the fact that you got some of the details wrong dissuade you. It's easy to misunderstand this code, I made many such mistakes when first digging into this.

Scooletz commented 8 years ago

Hi. At the very beginning I need to admit that I haven't read the whole project codebase, so I may not know some things.

I've read the thread and as far as I see nobody mentioned EventStore's implementation of TCP transport A few properties of it:

It uses a pool of preallocated SocketEventAsyncArgs. Once the event is free it's pushed back to the pool
It uses a managed buffer pool, a similar take you can see in the RecycableMemoryStream. The buffer pool is a single big array of bytes chunked to array segments. This way, the only object that is pinned is the big array already lying on LOH.
Fortunately, the SocketEventAsyncArgs class supports passing a list of ArraySegment<byte> using SocketEventAsyncArgs.BufferList. This fits the approach with a segment pool consisting of ArraySegment<byte> nicely.
There are two framing approaches, but only the one that uses int-length-prefix is actively used.

The best part is that this approach really lowers allocations and GC friction, as it addresses one of the most important things: pinning buffers by socket methods

Currently, I'm considering porting to .NET the Aeron project, which provides extremely performant NACK-based publishing over UDP. I don't want to repeat whole documentation, but the main point is that Aeron uses UDP to transmit batches of messages in a network friendly way. This would be a part of my RampUp but I can't promise any date yet.

jason-bragg commented 8 years ago

I've not looked at TCP transport, but it sounds interesting. Are there any load testing results for it? The reason I ask is:

Fortunately, the SocketEventAsyncArgs class supports passing a list of ArraySegment using SocketEventAsyncArgs.BufferList. This fits the approach with a segment pool consisting of ArraySegment nicely.

We explored this exact pattern in our networking layer and found a bug in the SocketEventAsyncArgs.BufferList implementation. We reported it to the technology group and they confirmed the issue, but I'm not aware of when a fix will be available (may already be fixed, as this was about a year ago).

Scooletz commented 8 years ago

@jason-bragg Unfortunately, I don't know about any load testing that has been performed. Could you link the issue you mentioned? Which .NET version was it?

jason-bragg commented 8 years ago

@Scooletz Unfortunately, I seem to have a gap in my mail archive which includes this time period. :/

From what I recall, the .net version was 4.5. We worked directly with the dev team, so I don't know of any public links describing the issue. The problem involved setting the BufferList with a new list immediately after a read, which would sometimes not update the buffers.

We could have worked around the issue, but chose to revisit this in the next round of network performance work, assuming the bug would be addressed by then. As we've been mostly focused on features, we've not scheduled another round of performance work, so there has been no follow up to see if this is still an issue.

This issue only happened with any regularity during very rapid call patterns, which, I suspect, is why most have not encountered the issue. Also, most usages of SocketEventAsyncArgs, in examples and other libraries I investigated, didn't use BufferList, so I was left with the impression that the bug was not detected sooner because the pattern was not common. Hence, TCP transport's use of the pattern caught my interest.

I'll check with others involved and see if we can pick the email chain back up.

Scooletz commented 8 years ago

@jason-bragg I've checked my assumptions again. I'm sorry but I made a mistake. EventStore TCP transport does not use the list. It uses a single buffer (here). The memory management part, using pools for args & byte segments still holds true.

sergeybykov commented 8 years ago

@jason-bragg @Scooletz I found the thread about bug in Socket class that includes a zipped repro. Let me know if you need any of it.

veikkoeeva commented 8 years ago

@Scooletz Hey! Cool, you could make it to take a look! @jason-bragg, @sergeybykov, @jdom and @ReubenBond have thus far been the mostly go-to people here. @galvesribeiro, @cmello and @attilah are likely at least interested too.

I'm speaking of my own observations here... I think there are improvements that could be had, like the observations you stated. The larger issue looming here is that should the networking code be outsourced. The outsourced one should be cross-platform, likely TLS support would be preferred at least optionally.

To me personally it looks like on Windows RIO is the way to go for extreme performance, but I don't have actual figures. It just looks like the most performant and also most stable option on virtualized environment (this this basically unconfirmed rumors I've heard). I don't personally know of any libraries that are cross-platform and also utilize RIO on Windows. There is some discussion at https://gitter.im/dotnet/orleans occasionally on various performance or networking improvements. Like today. :)

Scooletz commented 8 years ago

Cheers @veikkoeeva :) The biggest performance gain with Windows RIO is no sys calls at all. All the methods related to RIO are user mode and are extremely fast. I am not aware on any such library. The good part is that RIO does not require any specific protocol, hence, you could have a different plugin for Windows hosts. I need some more time to go through the current state of networking in Orleans, but as I'm looking forward to using RIO in my personal project, I'll go through Orleans code as well. I think there's opportunity to kill two birds with one stone.

sergeybykov commented 8 years ago

I mentioned in https://github.com/dotnet/orleans/issues/1372 that we are going to try to move to DotNetty. If that happens, the network related performance discussions should naturally move to DotNetty.

grahamehorner commented 8 years ago

While I understand why your looking at DotNetty I'd prefer it if the team looked at libuv at the work already done by the ASP.NET core team with regards Kestrel and performance. It also makes sense to use/extend the libuv wrapper/library created to allow .net core to send/recive udp datagrams in a platform agnostic maner, rather than push network communication onto another different 3rd party library

galvesribeiro commented 8 years ago

@grahamehorner like @sergeybykov mentioned on #1372, we are evaluating DotNetty, not choosing to use it right now. I agree that would be good to have a single socket library shared between Orleans and .Net Core however, people only used libUV on Kestrel to support Http and thats all. They are not going to implement the other features of libUV. In other words, the work on ASP.Net Core related to libUV is strictly and very optimized to HTTP stack and not a general purpose socket library...

If libUV is the target, I believe that we can have it as well. While talking to @nayato he mentioned that he wants to abstract DotNetty low level socket backend so he can add libUV to it.

So lets see what @jason-bragg will come up with his tests. :)

veikkoeeva commented 8 years ago

@galvesribeiro (and @sergeybykov) I second to @grahamehorner with a slight concern here. While I'm happy with anything that's an improvement and I don't really have skin in the game as in being able to work on this in the near months, this does seem like a classic case "we evaluate by doing the work of porting" and when it has happened, it is an accomplished fact. Which means there is not incentive to explore other libraries, echoed by what @sergeybykov just commented. Likely even an incentive to not to change it. I do recognize to free up time on the core team on this, so to speak, and perhaps have something of greater benefit to the .NET community.

Though to move discussion forwards a bit, I have a specific question, could it be possible to abstract the networking layer so that one can explore other libraries? I believe this involves managing buffer explicitly. I wonder also if I have questions about DotNetty, such as how the memory allocation works that if I should ask them here or on DotNetty (or can someone point the relevant place, there looks to be plenty of classes)?

nayato commented 8 years ago

@grahamehorner I think you're mixing things. libuv wrappers in Kestrel are pretty much analog of Socket API in .NET - about the same level of abstraction. DotNetty aims to provide easier model for writing networking communication layer - effectively bridging raw TCP provided by Socket API (or libuv for that matter) and application logic. It is comprised of a set of components and each of them can be replaced. DotNetty itself is a great testament to the API it inherits from netty as we were able to replace Java's NIO with .NET's Socket with relative ease. To summarize, DotNetty is in no way a competition to libuv. It is one level above - where you get the bytes from network and start processing them. It allows to put together TLS, simple framing protocol or a more sophisticated transport like MQTT or CoAP, auto-close connection on timeout on idling connection, etc. All with promise of minimal processing overhead and efficient memory management. I'm not sure yet (and neither @sergeybykov I believe) if there is good technical alignment between Orleans' networking/execution model and DotNetty's and this is what this evaluation is about.

nayato commented 8 years ago

@veikkoeeva, good place to ask about DotNetty is Gitter. I'd encourage to first read this, this, and this.

gabikliot commented 8 years ago

@sergeybykov, I wander, was there any business need, for one of the internal big users, to invest in such investigation? Is it the extra features (TLS, ...) or a promise for better perf? Just my curiosity.

sergeybykov commented 8 years ago

@grahamehorner @veikkoeeva @gabikliot The concerns make perfect sense but I think they are unwarranted here. I see three reasons this work is worth the effort.

Even if nothing else changed but we could remove code in https://github.com/dotnet/orleans/tree/master/src/Orleans/Messaging and https://github.com/dotnet/orleans/tree/master/src/OrleansRuntime/Messaging with an external library, that would already be a big win. Orleans provides value at a much higher layer of the stack, and the low level messaging code was written only out of necessity at the time. To get rid of it would help focus on the core Orleans functionality.
We hope to achieve better performance with DotNetty.
DotNetty is better layered/structured, which should allow us to easily plug in features like TLS and encryption or alternative implementation of the network layer, such as libUV or maybe RIO.

If we successfully do 1 and see no drop in perf, from that point on even replacing DotNetty with a different library should be much easier than it is today. So I don't see any downside in doing the work, only upside.

jason-bragg commented 8 years ago

Two topics: Why? and What?

Why? Why we should move to a third party networking library was covered very well by @sergeybykov. The core value add of Orleans is the virtual actor model. Networking is, for the most part, a solved problem so maintaining our own implementation is just extra overhead. It's comparable to maintaining custom xml or json serializers.

What? What networking library we should move to, in my mind, comes down to requirements. Below are what I'm looking for, though other opinions are welcome:

Fast - data buffering, buffer pooling, minimal data copying, good threading model, support high throughput and fast response-time service architectures.
Open source - We can see, and fix code.
Well maintained - Effort is being made to address bugs, improve performance, integrate new technologies and add new features when needed.
Extensible - Well-structured to support different serialization and security needs.
Good programming model - Programming model is easy to understand, maintain and test.

There are likely many networking libraries that fit our needs, but we only need one. I want to be clear that this prototype work will not mean we will use DotNetty, only that I consider DotNetty a viable enough tech to justify the time investment to prototype, but if the prototyping goes well, it is likely we will use it. This does not prevent others from performing similar prototypes to compete with the DotNetty solution.

veikkoeeva commented 8 years ago

@nayato Cheers for the curated links, I'll take a look in the coming weeks.

To be clear here, I don't oppose moving to DotNetty even without examining other options if the spike proves to be an improvement. @sergeybykov and @jason-bragg reiterated some of the larger benefits quite well. I believe especially in the idea of working on something for the larger benefit of the .NET community.

My concern could be rephrased so that what are the important aspects when choosing a networking library for Orleans? If another proposal does not come across the bar to be switched to the core (after all, it is working reviewing and introduces uncertainty), how laborious would it become to switch to one's own if there were benefits? Much likely most of this can be solved only with time and the specific needs at hand, but maybe some thought could go on how could it happen. One aspect here is that maybe we want some sort of idea how solutions could come part of Orleans, even if vague. Like, for instance, what protocols should be supported?

Related to this, I'm also afraid I personally create fatique to other ideas whilst inquiring reasoning and contribute to a climate of squabbling over minutiae whilst it's not my intention. Specifically related to DotNetty I don't know how it functions, but I know I have worked in fields Orleans would have been useful with the caveat one really would need to know how much memory would be used (at most) and pinning should not happen (unless in reserved blocks, see earlier comments) and hopefully no gargabe generated. Manufacturing executing systems and nation-level infrastructure with some rather stringent requirements. Not as strict requirements as How Software Gets Bloated: From Telephony to Bitcoin, but it gives eloquent perspective.

I disagree to some extent with the notion networking is a solved problem. The video Aeron: The Next Generation in High-performance Messaging clearly points this out (I remembed aimed some punches at Jetty too, and for what I know, some claim other libraries are better) and it has been brewing a while to somehow escape the current socket model. Which may, or may not, matter here.

jason-bragg commented 8 years ago

Anyone interested in an early peak at some of the DotNetty prototyping please check out https://github.com/jason-bragg/orleans/commit/33d11da7d11373002c391e829936f235263e55fa#diff-b453c26b43105f90c2d0dc6bbc319419R19

grahamehorner commented 8 years ago

@jason-bragg I'm definitely interested

galvesribeiro commented 8 years ago

Yay! Will have a look over the weekend! Great work! 💃

cmello commented 8 years ago

Thank you very much @jason-bragg , I'll compare against the benchmarks I done for issue #1639 soon!

nayato commented 8 years ago

@cmello, @jason-bragg to give you some insight, with recent changes in DotNetty (0.3.0 coming out today) we're seeing ~170K op/sec with following setup:

Single D2v2 VM
Single Client Channel connects to a server (same VM) and sends 1M ops in batches of 10 or 100.
codec is prepending message (3 byte long) with length prefix (4 byte wide) when writing, slicing byte stream into actual messages based on length prefixes when reading. One catch is that Client is scheduling messages from the outside (which is a valid scenario) but optimizing it further has shown to increase throughput ~1.5 times. That basically highlights overhead of communication over a single channel. I'm quite happy with this as is but we'll sure keep focus on perf going forward.

veikkoeeva commented 8 years ago

@Scooletz FYI: Aeron.NET.

Scooletz commented 8 years ago

Just wanted to paste it @veikkoeeva :-) That's the way to go IMO.

davidfowl commented 8 years ago

Jumping on this thread because I asked some questions on the Twitters 😄 . I've started this project called Channels (https://github.com/davidfowl/Channels) that's based on the work we did in Kestrel to make it go fast. The idea is to start with a simple primitive and build up the entire networking stack around them. It's going to be cover the range of scenarios somewhere between netty and the socket API. It's still in it's infancy but the idea is sound and other have already written some pretty cool things on top:

On top of that, the idea is that once you're using channels the transport code should be swappable (just like stream but more efficient) https://github.com/davidfowl/Channels/tree/master/src.

There's a libuv tcp listener and tcp client
A Windows RIO tcp listener
One based on .NET Sockets (which needs to be put into a separate package)

TL;DR all the things you have to do the do .NET sockets properly and fast should be handled for you and you can do parse the frames of your protocol without worrying about the other stuff (very similar to DotNetty really but lower level).

I don't have any performance data yet, still early days but trurst me, it'll go fast 😄

veikkoeeva commented 8 years ago

@davidfowl RIO sounds good. :) While at it, have you considered the things happening in user-space networking (see links, like the Intel one))? As you're planning something pluggable, it might be useful to look at these too at least tangentially as it looks like the traditional models are changing due to necessity. Not mentioned in the other issues, this might give some ideas too Co-Design Architecture: Emergence of New Co-Processors.

Then maybe the SDN changes in Server 2016. It would be great to have some sort of an API for these next to sockets so one can tune the actual hardware too, even the network to have a more holistic system (see also some of the links in the aforementioned issue). One interesting use case with regards to Orleans would be to use it as an overlay on Server 2016 (that example uses ZooKeeper on non-Windows).

Maybe what I'm trying to tell is that please consider these issues so it would be easier to tap into these capabilities or build them in the future. :)

sergeybykov commented 8 years ago

@davidfowl Your effort in this space is very much appreciated. @jason-bragg investigated the option of replacing most of our messaging stack with DotNetty. He is right now busy with something else, but he has a private branch with his prototype that might be relevant, and he can shed more light on messaging and buffer pooling parts.

gabikliot commented 8 years ago

I will be interested to know what where the conclusions of the dot netty investigation.

cmello commented 8 years ago

@gabikliot sorry I switched to a new job and at this time all my effort goes to ramping up. I won't be using .net for a while. Best regards!

gabikliot commented 8 years ago

I thought @jason-bragg was looking into it. I was responding to Sergey's comment.

dotnet / orleans