dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

API Proposal - NetworkPipe #26668

Closed Drawaes closed 4 years ago

Drawaes commented 6 years ago

Rationale

Currently pipelines has made it to the BCL however in almost all use cases (networking files etc) in order to actually use these primitives you either need to roll your own or "hook" the pipeline up to a stream which breaks the model somewhat of shared pooled buffers being pushed up from the source.

We can see a number of proprietory and OSS implementations of a "network pipe" as well as internal implementations in AspNetCore. @mgravell has an "unofficial" nuget package that is a temporary implementation that is currently out there. However I believe this is a "core" library function and should be a minimal amount of code that is highly performance critical and needs many eyes on it.

API Shape

The following API is proposed to both align with the "natural" way that pipelines works and "looks" but to also to be very close to the shape (for the constructor anyway) that NetworkStream uses.

namespace System.Net.Sockets
{
    public class NetworkPipe : IDuplexPipe, IDisposable
    {
        public NetworkPipe(Socket socket, PipeOptions pipeOptions)
            : this(socket, pipeOptions, FileAccess.ReadWrite, true) { }

        public NetworkPipe(Socket socket, PipeOptions pipeOptions, bool ownsSocket)
            : this(socket, pipeOptions, FileAccess.ReadWrite, ownsSocket) { }

        public NetworkPipe(Socket socket, PipeOptions pipeOptions, FileAccess access, bool ownsSocket) { }

        public PipeReader Input => throw new NotImplementedException();
        public PipeWriter Output => throw new NotImplementedException();
        public void Dispose() => throw new NotImplementedException();
    }
}

Open questions

  1. I am not precious about the API shape so have at it
  2. It might be a good idea to allow recycling here ? Or maybe its not worth it on a per connection basis?
  3. Are there more options that would be needed?

Obviously other classes can be built up from here (listerner, connection etc) but this class underpins both a server and client side implementation

/cc @mgravell @benaadams @NickCraver @davidfowl @stephentoub

stephentoub commented 6 years ago

If we haven't already, we should instead expose a Stream-based "pipe" and then use the existing stream types, e.g. NetworkStream. Going the proposed route is a slippery slope and is the primary concern I have with pipelines in general, that it'll lead to significant duplication and bifurcation.

mgravell commented 6 years ago

Going the proposed route is a slippery slope and is the primary concern I have with pipelines in general, that it'll lead to significant duplication and bifurcation.

I hate to say it, but that's just another way of saying "we shouldn't have done pipelines". If you throw a Stream/Pipe converter into the mix, you basically lose just about every single benefit of pipelines. If that is indicative of the actual position, frankly you might as well nuke pipelines from corefx and nuget, wave a flag and say "oops", and move that code back into the inner guts of Kestrel.

On how/why a pipe/stream connector is suboptimal, I could start enumerating the reasons if you like (I have 6000+ words of blog post on pipelines I was going to publish this week, but I reckon I've got about another 2000 still to write). It becomes simply "kinda like BufferedStream but not quite, and with an API you're not familiar with and which is awkward to use with existing code surfaces". But pipelines are so much more than that.

Side note: the "unofficial" package Tim mentions also has Pipe/Stream converters in both directions. Shameless links: nuget, github

benaadams commented 6 years ago

Could move NetworkStream onto NetworkPipe using Pipe -> Stream Adapter?

Then have code reuse + advantages of Pipelines when used directly?

mgravell commented 6 years ago

Note: I'm not against pipe/stream converters per se; They're hugely useful when a native pipeline implementation isn't available; I use it to wrap SslStream currently (wonders: should I add "API Proposal : TLSPipe"), although I'm also keeping an eye on Tim's "Leto" there. But the point is: a converter is a necessary compromise due to lack of current implementation, not a smiley happy "this is fine" thing.

davidfowl commented 6 years ago

There might be another path where the converter could be as low overhead as possible. Some of the difficulty in replicating some of the Stream behavior has to do with the fact that it may be the only high level way to access functionality in a cross platform way. Nobody wants to maintain 2 code paths for doing async file IO (for e.g.). Part of this proposal was exploring some of that https://github.com/dotnet/corefx/issues/27246.

That said, networking is one of the core scenarios for pipes and NetworkStream doesn't do that much so there isn't that much harm here IMO. There's a slippery slope argument somewhere in here but Networking doesn't need to start that trend. FileStream on the other hand, I would stay away from 😄 .

Drawaes commented 6 years ago

I am with Marc on this one, the cats out of the bag. Either pipelines is a first class BCL citizen or it should have stayed in aspnetcore. Networking is the primary use case for it and today you have to drop back to streams to actually do anything with them (aside from writing your own socket interop which is as we all know not easy to get right). By putting it in the sockets assembly we can lift a lot of code here.

If we look at it conceptually we could make the same argument about Task vs APM and Span/Memory vs Array. They have all become wrappers for each other in various ways they all caused code duplication/refactoring and duplication of concepts. Why were they all done ? Efficiency and a simplier programming model ... Which is pretty much the goal for pipelines and networking.

That is why in the first spot maybe the implementation for Network pipe is just wrapping networkstream to get it up and running and then we can look at commonality and if the roles need to be reversed but eitherway I think this type is critical.

stephentoub commented 6 years ago

but that's just another way of saying "we shouldn't have done pipelines [...] If that is indicative of the actual position, frankly you might as well nuke pipelines from corefx and nuget, wave a flag and say "oops", and move that code back into the inner guts of Kestrel."

Yes, that has been and continues to be my (personal) opinion. We did ship it in corefx, of course, but that doesn't mean we should go and add a bunch of additional types just because it's there. The core primitives are there; if someone wants to build their own implementations of those primitives outside of corefx, have at it.

If you throw a Stream/Pipe converter into the mix, you basically lose just about every single benefit of pipelines.

Really? NetworkStream.Read/WriteAsync are simply passthroughs to Receive/SendAsync on Socket, no additional buffering, using the ValueTask-based and Memory-based overloads that are zero-allocation, that avoid pinning if the Memory is already pinned, etc. You're saying that the margin of perf pipelines provides is so thin that not doing so directly on top of Socket means pipelines aren't worthwhile? I'm interested in seeing that benchmark.

geoffkizer commented 6 years ago

At the very least, it seems like we should start with a Stream-based Pipeline implementation, as @stephentoub suggests. There are a ton of existing Stream implementations, so having this enables lots of scenarios.

It sounds like there are multiple implementations of this already, right? @mgravell has one, I seem to recall @drawaes mentioning one previously, ASP.NET has something like this internally for SslStream, etc. So let's build the official one and ship it.

Until we have that thing, it seems like the NetworkPipe vs Pipe over NetworkStream discussion is pointlessly hypothetical.

Drawaes commented 6 years ago

I agree the other is needed... But I don't think that precludes this and here is why.

Forget perf entirely for a minute (given the crowd its a tough ask)

I have recently been involved in a number of large ongoing projects to move to .net core from both .net full and other languages that will remain nameless. The question ... We are writing our custom protocol x how should we do it?

Use pipelines (it just fits messages split across buffers, pooling, sequences advance etc) then I am asked "how do we start". I have said "use this wrapper around a network stream" this literally causes visible eye rolls and then they go off use streams do custom pooling and custom split/multi buffer and forget about it.

So if you want the API to be used and kill off all that nasty logic and code and have people do things fast and the right way you need this class at a minimum. If it just wraps a networkstream internally and does a stream to pipe right now I don't mind any perf can be worked out in the wash later.

But as it is today I still say this API is important from a usability standpoint not a perf or any other discussion ...

(in my opinion) ;)

geoffkizer commented 6 years ago

Forget perf entirely for a minute (given the crowd its a tough ask)

Not sure I understand, can you explain?

stephentoub commented 6 years ago

I have said "use this wrapper around a network stream" this literally causes visible eye rolls and then they go off use streams do custom pooling and custom split/multi buffer and forget about it.

So the agony of typing new StreamPipe(new NetworkStream(socket)) instead of new NetworkPipe(socket) is so high that a developer instead chooses to "do custom pooling and custom split/multi buffer and forget about it"? If so, there are much bigger problems that this proposal is nowhere close to addressing.

Drawaes commented 6 years ago

Yes because they don't see it as a complete API ... They understand how to do the other thing (admittedly not optimally but still it's understood) and they don't want to learn or understand something new if a stream is dropped back to.

Don't get me wrong I don't have a huge problem with wrapping if the perf is equivalent ... But app devs aren't framework or library writers and comfort trump's a lot.

mgravell commented 6 years ago

If you want to start with a stream wrapper: fine. It'll make a great start for comparing competitive tests between the two, part of justifying whether it needs to exist as a specific endpoint. I can overlook a few missing things like the ability to correctly propagate the shutdown of the two directions independently, that's nice to have but whatever. And a few other subtle semantics that are awkward to express.

But there are zero endpoints for pipelines today, except for Kestrel or community efforts. Which makes it non-existent.

stephentoub commented 6 years ago

If you want to start with a stream wrapper: fine.

Yes.

They don't want to learn or understand something new if a stream is dropped back to.

That makes zero sense to me. They're willing to learn an entirely new API and way of doing things, but only if they're required to drop everything else in .NET they've learned about thus far? I don't buy it.

geoffkizer commented 6 years ago

But there are zero endpoints for pipelines today, except for Kestrel or community efforts. Which makes it non-existent.

I agree. I think it was a mistake to ship pipelines without useful endpoints. I think having a stream wrapper at least gets us to a minimally viable place.

I think we need to:

(1) Implement the stream wrapper (2) Ensure that this has reasonable usability and minimal perf overhead vs using Stream directly (particularly for NetworkStream) (3) Consider how we can improve functionality or perf for specific scenarios, e.g. NetworkPipe or SslPipe or whatever.

Drawaes commented 6 years ago

Alright let's do that I will write a wrapper proposal in a few minutes

Drawaes commented 6 years ago

Okay I built a proposal for a bridge (literally and figuratively) over at dotnet/corefx#30791

stephentoub commented 6 years ago

Thanks, @Drawaes.

ghost commented 6 years ago

Late feedback for @stephentoub and maybe also @davidfowl. I'm totally on the same side of @Drawaes & @mgravell and I would like to emphasis one thing:

Casual .net developers (so I think 99% of the people), don't spend their own time looking at github source codes or issues, looking for blog posts/video to learn about something. They generally have little time for this because their work is to deliver some code.

I see .net core 2.1 improvements really big, but on the other side I fear they are not accessible to the casual people.

Taking Pipeline for instance:

For the user to be in the situation of the "pit of success", I strongly believe that Marc's blog posts are mandatory reading and some real-world implementations are also needed.

And that's where I want this to lead:

Microsoft used to have an extremely well documented content and very mature APIs for the .net framework which put people in good working condition and for .net core 2.1 I feel it's not the same, the reason are obvious to you all, but again, the common guy will want to switch to .net core and I believe Microsoft still should keep the same level on the matters I just talked about.

But in this case, it's community people that did the rest of the job, on one side: it's great! On another one I can't stress enough that it makes it less accessible to regular users.

Marc's blog posts should be adapted and be part of the official documentation and I wish MSFT people could do that with Marc's consent. The official documentation, if sparse, should at least redirect to key community assets (doc, NuGet packages, etc.)

Finally, for Stephen, your remarks are, from my point of view, unrealistic to 99% of the people working with .net. The 1% elite is always on the page, actively reading everything, participating, contributing and can do what you say (but this GitHub issue looks to me that some of these 1% people are even not really satisfied with the current situation) but for the rest, it's an investment that few will be willing to take, imho.

TL; DR The learning curve and productivity are no longer what they used to be, and I hope MSFT people are aware of this

mgravell commented 6 years ago

@nockawa just to note - I'm slightly less concerned about that; and I know that some MS guidance documents are on the way. In some ways I wouldn't want documentation to be the barrier that stops things shipping for a particular milestone, but it would be good if there was a commitment to quality consistent documentation. Even knowing where to look gets tricky - you can't find System.IO.Pipelines on MSDN, for example.

But in general I'm OK with the documentation on new APIs lagging a little, as long as it gets done before too long.

davidfowl commented 6 years ago

I'm on the hook for pipelines documentation. It's part of what I'm actively doing right now (we also have some other things to document that we haven't as part of 2.1).

@mgravell did us a huge solid by writing those posts and raising awareness. I have one coming hopefully this week or next week as well.

To the other points about not having any default implementations, I agree it's a huge hurdle when getting started, that plus the lack of examples and docs magnify things. We didn't really expect a large number of people to pick up these low level APIs so we didn't prioritize all of the right things and frankly some of the APIs are not as easy to use yet so there is some hesitation to adding extensive documentation on how to use the types.

That said, it's coming, I'm typing right now 😄

Drawaes commented 6 years ago

If you want a hand on filling out documentation (rather than just a blog post) I am happy to help.

ghost commented 6 years ago

I know that release cycles are not the same anymore and I agree it's great to have bits as soon as possible, and I don't mean to blame too.

But (there's always a but) .net core documentation for asp.net core and other major components are not on the same level as what Microsoft did with .net framework, Pipeline is just one example, Span/Memory<T> got a lot of hype and articles/posts so it was easier to jump on-board but it was also a bit of the jungle out there. If it get sorted out in time, cool, that's what matters.

I just hope it will and so far my feeling is more guidance and entry points would give these APIs an access to a much broader audience.

Just stating that there's so far no official Pipe implementation in the overview documentation would clear something out very quickly for many people. Describing the intent of the API would also help, I honestly though I could rely on it out-of-the-box, but nothing told me it was not that trivial.

stephentoub commented 6 years ago

Finally, for Stephen, your remarks are, from my point of view, unrealistic to 99% of the people working with .net.

IMO, "99% of the people working with .NET" need not be using System.IO.Pipelines. It's an advanced namespace / library, for a specific set of scenarios.

And even if that weren't the case, I believe that introducing additional types doesn't make it easier to adopt. From my experience, it would be easier to adopt with a gradual bridge from the .NET world that's flourished for the last almost two decades, a world based on Stream. If you're a .NET developer and you do stuff with networking, you know about NetworkStream, so why force someone who also wants to start using System.IO.Pipelines to learn about yet another Network class, rather than just continuing to use the NetworkStream they know about? Why confuse someone working in a world of streams with another Network type that they may now find in docs and internet searches and be unsure/confused about what subsystem they should be using? Why add additional code to the platform that needs to be developed/tested/maintained/documented/etc.? Unless there's a significant perf advantage to doing so (and I'm quite skeptical there is), to me adding such a NetworkPipe type (and the bevy of other xyz-stream-types-reintroduced-as-pipe-types that would inevitably follow), it actually decreases the quality/learnability/maintainability/etc. of the platform, rather than increasing it.

ghost commented 6 years ago

@stephentoub it's a fair point, thanks for clarification.

I was looking to use Pipelines while programming a NoSQL database engine storing POCO on my spare time and I got interested by pipeline to reduce the memory copy of file system and network mainly.

As I wanted my project to be pretty performances driven. So I would categorize myself in the 50-90% of the people 😄 and I was hoping to find ready to use classes for Pipeline with file and TCP network.

Drawaes commented 6 years ago

I think the issue is moot until we get the bridging class and see what uptake (if any) there is. The bridge will be good enough for now and then further data driven (or at least experience driven) decisions can be made after that.

davidfowl commented 6 years ago

IMO, 99% of the people working with .NET should not be using System.IO.Pipelines. It's a very advanced namespace / library, for a very specific set of scenarios.

FWIW I wholeheartedly disagree with this and it has nothing to do with performance, it has to do with correctness. I think anyone who has written this type of code before (assuming you have used Socket or NetworkStream) will benefit from using something like pipelines. That's already evident if you read @mgravell's blog post.

@nockawa

I was looking to use Pipelines while programming a NoSQL database engine storing POCO on my spare time and I got interested by pipeline to reduce the memory copy of file system and network mainly.

As I wanted my project to be pretty performances driven. So I would categorize myself in the 50-90% of the people 😄 and I was hoping to find ready to use classes for Pipeline with file and TCP network.

I'm working on another blog post that explains why this type exists and if it should be obvious why you'd want to use it for something like this. Hopefully it'll be self explanatory at once you read the post 😉 .

ghost commented 6 years ago

I, personally, am all for using Pipelines, it was clear before, but it was the "last feature to learn that I couldn't afford to" on the 2.1 and C# 7.2 releases after dealing with Span, Memory, SignalR, ref struct, etc.

Reading Marc's blog post and the current activity around this topic will have at least for me the benefit to clearly understand everything I need and I believe I'll have to use it more than once in the future.

I work in a sector were we need to be very cautious about performance, hence memory oriented stuffs.

svick commented 6 years ago

@davidfowl

FWIW I wholeheartedly disagree with this and it has nothing to do with performance, it has to do with correctness. I think anyone who has written this type of code before (assuming you have used Socket or NetworkStream) will benefit from using something like pipelines. That's already evident if you read @mgravell's blog post.

If you want people who don't care about performance to use pipelines, then I think the API could use some polishing when it comes to convenience. For example:

Note that all I know about pipelines I have learned from @mgravell's blog posts and from skimming the API on fuget.org.

ghost commented 6 years ago

@svick

When I get ReadOnlySequence, why do I have to care that it's composed of multiple buffers?

It's not a matter of caring or not, as Marc explained in his blog posts: best performance possible is the main driver in pipelines, in order to achieve this the system avoid data copy whatever the consequences are.

The internal implementation of pipelines will always rely on a circular paged buffer, in order to avoid copy to a linear buffer, which would be easier for you to deal with is but which is certainly not the desired design.

Maybe you already know all of this, but Pipeline induce some "constraints" that make the class a bit harder to deal with, but it's for the sake of performances.

You apparently wish to have a higher level API were you deal with message buffers instead of sequence of byte array, I believe that's not the purpose of Pipelines but you can build your own on the top of it.

You would have something easier to use and less "messy" and more on par with the latest standards/patterns than the Stream based APIs, IMHO at the cost of some perf loss.

geoffkizer commented 6 years ago

When I get ReadOnlySequence, why do I have to care that it's composed of multiple buffers?

It's not a matter of caring or not, as Marc explained in his blog posts: best performance possible is the main driver in pipelines, in order to achieve this the system avoid data copy whatever the consequences are.

I think it's important to remember that there are tradeoffs here.

As you point out, pipelines avoid data copies by managing a sliding list of buffers. This comes at the cost of having to use ReadOnlySequence instead of just a linear Memory, which means you have to write your code to deal with discontiguous buffers. Plus, there's some (small) overhead for things like Slice or random-access on a ReadOnlySequence vs a Memory.

For many scenarios, this is (hopefully) a fine tradeoff to make. In particular, we have a lot of experience using it in ASPNET for processing HTTP requests.

For other scenarios, other tradeoffs may make more sense. In particular, if I was receiving a bunch of smallish, fixed-size requests, I would probably just read directly from the Stream or Socket and use a single buffer and copy partially-received requests down to the buffer start. Avoiding copies in this case probably isn't worth the added complexity and cost.

Pipelines are intended to make it easier to process bytestreams correctly and performantly. But as always, it's important to understand the tradeoffs. It's always possible to build a custom solution that's better for your specific scenario, if you're willing to accept the cost and headaches.

(Obligatory reminder here that premature optimization is the root of all evil, and you can only really understand the performance of your system by measuring it in practice :))

Drawaes commented 6 years ago

Other than udp which pipelines doesn't support anyway... Is there a protocol with all of the same sized buffers.... I am yet to meet one (okay maybe lower framing like TLS but that is a framing layer) I think the majority case is variable message length as it's rare for application payloads to always be equal.

Also if you think most packets will be say "2k" then size your pool buffers that big or with a small amount of extra room. Then fast path deserialization if the sequence is a single segment. Then you get the best of both worlds... All the data in one buffer unless it's big and then it is auto split for you.

davidfowl commented 6 years ago

That’s right there are tradeoffs. The other thing to remember is the fact that Pipelines is both and abstraction and an default implementation. The abstraction leads you into a pattern that reduces errors. The default implementation has made some tradeoffs on your behalf based on our experience with processing steaming data. There are also a bunch of options in the default implementation that give you control over some of these tradeoffs.

That said, I fully expect to see more implementations appear in the fullness of time because the pattern itself is so useful.

davidfowl commented 6 years ago

When I get ReadOnlySequence, why do I have to care that it's composed of multiple buffers? What I'd like to see is a sequence of bytes, not a sequence of buffers of bytes. (Maybe ToArray() is good enough for that? Though I think I would prefer something that didn't allocate.)

If it were possible to do that without allocating but I don’t know if a way to do that. It’s possible there could be a resize memory allocation strategy that would copy instead of allocating chunks but I wouldn’t make it the default. The goal of this API is to be a bit of success for both performance and correctness. I think people will use it when they realize how error prone it is to do any of this manually today. If you don’t care about peformance you can definitely call ToArray(). There’s also a middle ground where we could allocate if there are multiple buffers per return the existing array of not. The tricky part there is that isn’t hard to predict the performance then but it could be done for most cases.

That’s said, there are some emerging usage patterns that I’ve seen that will make using the ecosystem around these types easier but we’re not there yet.

When I know that messages I'm reading have a specific length or use a specific delimiter, I want to use a simple method call to read the next message, instead of having to loop and call the examined overload of AdvanceTo()

There’s an API prosposal for an overload to ReadAsync that takes an int for the minimum bytes required. I think that fits the bill here. You could read the length prefix, then pass that value to ReadAsync and only get called back when st least that many bytes have been buffered.

AdvanceTo is currently too core of an API to avoid calling. Lots of the performance benefits come from being able to explicitly signal when you are done with a segment of memory.

Love the feedback though. As this is a new API with some foreign patterns, I expect lots of questions like this and I also expect to learn a lot from how people end up using it!

benaadams commented 6 years ago

Obligatory reminder here that premature optimization is the root of all evil

Don't go there... Its a very out of context quote:

The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. [ ... ] We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

by Donald Knuth in a 1974 article called "Structured Programming with go to Statements" arguing that using go to was good for performance in certain circumstances and shouldn't be avoided:

In other words, we shouldn't merely remove go to statements because it's the fashionable thing to do; the presence or absence of go to statements is not really the issue. The underlying structure of the program is what counts, and we want only to avoid usages which somehow clutter up the program.