Discussion: advantage to using sync over async for I/O?

roji commented 8 years ago

Continuation of discussion from dotnet/corefx#4868

Without AuthenticateAsClient(), Open() must resort to calling an async method internally, which means that it's no longer purely synchronous... The code above is as synchronous as the old implementation. When you call synchronous I/O on something like NetworkStream, internally, the thread will be waiting on an event for the operation to finish. Even more broadly: in reality there is no such thing as synchronous I/O in Windows.

Thanks for this detail. I was aware that at least in some cases sync methods actually perform async operations under the hood.

However, coming from the Unix universe, it doesn't seem to hold that there's "no such thing as synchronous I/O" at least at the kernel level: I would expect .NET I/O operations to map to simple sync I/O system calls such as write and read. Also, the implementation details underlying async operations across platforms may make sync I/O more attractive at least in some cases (i.e. is async I/O implemented efficiently on Linux with mono, or with coreclr for that matter?). I admit I'm just asking questions, I don't have enough solid knowledge here.

Regardless of these questions, and I said back in dotnet/corefx#4868, there's something odd about removing such a specific sync API (SslStream.AuthenticateAsClient) for such a general "sync I/O is bad" argument, without touching any other sync API. In other words, I'd expect either all sync APIs to stay for backwards compatibility or for a more sweeping removal of sync I/O APIs.

roji commented 8 years ago

One more important note. You're effectively telling me to implement .NET's Open(), a sync API, by internally calling an async method and waiting on it. This is a scenario discussed (and warned against) by Stephen Toub: http://blogs.msdn.com/b/pfxteam/archive/2012/04/13/10293638.aspx

roji commented 8 years ago

One additional reaction to something you said before:

One disadvantage of async I/O is the pressure put on the thread pool...

Pressure on a thread-pool shows poor design in the Async system. Ideally you should always try to keep the number of threads equal to the number of CPUs. Everything above that is going to add overheads of thread switching and memory consumption. This can be achieved by using TPL Async. The concurrency level can be controlled via the TaskScheduler class.

That certainly seems true in a single application that's totally within my control, but what happens in a large complex concurrent application using several 3rd-party libraries which themselves may or may not rely on the TP? It can easily become complicated to to manage this and understand the exact TP needs across everything...

At the end of the day, async APIs introduce a dependency on a global shared resource (the TP) which sync APIs don't have...

Any comments on all this?

clrjunkie commented 8 years ago

That certainly seems true in a single application that's totally within my control, but what happens in a large complex concurrent application using several 3rd-party libraries which themselves may or may not rely on the TP?

You raise a valid concern,

Note, that in .NET the TP (e.g ThreadPool Class) actually abstracts two "Thread Pool" implementations: A Managed Work Queue as well as various unrelated methods to bind IOCP callbacks with the Win32 Native ThreadPool

The Socket API will invoke the callbacks using the Native ThreadPool worker thread.

If you can guarantee (which I doubt) that users of your library won't abuse your callback context with cpu bound work than an alternative approach would be to have the Socket library provide the ability to bind with a dedicated user-defined (or framework created) Native Win32 ThreadPool.

This is by the way how it's tackled in Native Win32 application development but again, the assumption here is that the usage of the dedicated ThreadPool is not abused by its caller.

I agree that having a Sync API has it's benefits in certain situations - we don't live in a perfect world.

Furthermore, putting aside async theory, Windows scheduler is super optimized and I recall reading some time ago a benchamrk for ASP.NET that showed Sync request handling actually outperformed async/await requests handling, I didn't have a chance to investigate this therefore you should judge this yourself, try searching for it online.

clrjunkie commented 8 years ago

Actually, the benchmark showed sync outperforming async in a client side test which is more relevant to this issue.

Here is the link:

http://www.ducons.com/blog/tests-and-thoughts-on-asynchronous-io-vs-multithreading

roji commented 8 years ago

@clrjunkie thanks for the added info and the benchmark!

@CIPop @stephentoub any comments here?

roji commented 8 years ago

Just wondering if there are any reactions to the above discussion...

benaadams commented 8 years ago

@clrjunkie that test is quite a bad implementation of both sync and async (async void with no coordination) and they aren't comparing the same or best approaches for both. Also 57 second to execute 1M requests is fairly slow; since the aspnet5 Kestrel Server using purely async/await methods will both happily execute over 1M requests in under a second for small requests:

And crush a 10GbE network for large requests on (Azure G4 Windows VM -> Linux VM)

All using coreclr x64 and pure managed async/await and asynchronous I/O

benaadams commented 8 years ago

At the end of the day, async APIs introduce a dependency on a global shared resource (the TP) which sync APIs don't have...

@roji the Task returning "async" method doesn't have to actually be async; it can just return a cached precompleted Task (like Task.CompletedTask) or ValueTask for returned values if it is mostly sync; or a cached result Task if it mostly returns the same value.

Some good tips here https://channel9.msdn.com/Series/Three-Essential-Tips-for-Async/Async-libraries-APIs-should-be-chunky

Sync methods are good for things that really are sync; but sync blocking apis aren't good.

clrjunkie commented 8 years ago

@benaadams

that test is quite a bad implementation of both sync and async (async void with no coordination) and they aren't comparing the same or best approaches for both.

What do you mean by “quite bad”? Have you tried to apply the best approaches to the test code to see if they make any change to the final outcome?

Also 57 second to execute 1M requests is fairly slow; since the aspnet5 Kestrel Server using purely async/await methods will both happily execute over 1M requests in under a second for small requests:

I’m well aware of the TechEmpower Benchmark (e.g “Stress Test”) and Wrk (more than year) and suffice to say that aside from two points I’m not a big fan. I share the opinion of others that such benchmarks do not represent real world scenarios (see https://webtide.com/jetty-in-techempower-benchmarks/) Furthermore, there are many obvious points of concern for a production webserver that are not measured in this benchmark that have direct impact on performance but I currently don’t have the time to discuss them (although I think a serious discussion should take place). Saying that, there are two things I do find interesting about the stress tests 1) It tests if whether your code crashes during high load 2) it shows you what impact doing ‘new’ all over the place has and I consider the whole RPS thing as natural side effect, but looking at the bigger picture to me it’s not interesting. Quite frankly, I’m a bit disappointed that MS decided to participate in this race as I think it’s a marketing honey pot. It’s funny that when you look into the results that start remotely to resemble a real system test with a DB, then the difference between the top runner and the "old" ASP.NET is negligible.

tech-bench

Furthermore, I think @roji main concern is about a non-pipelining request/response exchange pattern so I don’t think the wrk pipelining test result that pass 50 - 100 bytes per request back and forth is applicable here, or to the majority of uses cases for that matter.

Food for thought:

Can you prove that for each request generated by wrk you actually get a full valid response containing all the bits the server sent?

p.s

To avoid any doubt I don't suggest that your optimization efforts are not important. I simply don’t agree with, well with the benchmark as well as the importance put on an RPS that changes instantly once you add a small requirement or slightly change the test environment. I simply don't think this is serious..

benaadams commented 8 years ago

@clrjunkie

What do you mean by “quite bad”?

Its very much an apples and oranges comparison. The throttling in the async was achieved by limiting the connections with ServicePoint which is not simple (see FindConnection and SubmitRequest when over the limits) so queuing on threadpool vs queuing on ServicePoint is what's mostly being tested here; the threadpool being far more efficient and the threadpool is what async uses for its general use.

I’m well aware of the TechEmpower Benchmark (e.g “Stress Test”) and Wrk (more than year) and suffice to say that aside from two points I’m not a big fan.

Also my point had nothing particularly to do with the TechEmpower Benchmarks. It was that non-blocking async I/O is not slow in .NET and you aren't loosing anything in perf by comparison to the time taken by the actual I/O (similar to your raw response vs response w/ db point).

It is also hugely more scalable than (sync) blocking I/O; in fact that is one of the issue with the benchmark you linked to. It is 1M requests and the sync blocking is limited by how many requests it can process at one time as its throttled by the threadpool. One the other hand the async doesn't have that limit so it can send all the requests fairly concurrently (depending on RTT); which will probably additionally make the server unhappy; and then memory issues observed wil arise because so many responses arrive back at the same time.

Furthermore, I think @roji main concern is about a non-pipelining request/response exchange pattern so I don’t think the wrk pipelining test result that pass 50 - 100 bytes per request back and forth is applicable here, or to the majority of uses cases for that matter.

The 12.6 GBps throughput screenshotted was from sending 300kB per request, non-pipelined, to 1000 connections which is half the page weight of for example the Angularjs or 60% of the npm homepage to pick two. So fits more with a "real world" scenario. Granted if you talk to a db that will be your bottleneck, but webserver generally does a lot more than that; whether its through memory caching, serving html+img+js; streaming videos, downloading files etc

Have you tried to apply the best approaches to the test code to see if they make any change to the final outcome?

Since I raised the criticism, I'll see if I can present a better approach - will get back to you.

Caveat If there is no async, or blocking component (e.g. only cpu bound work) then sync is better than async

Related async vs sync testing to ensure the server holds up some, for people that do use sync I/O https://github.com/aspnet/KestrelHttpServer/pull/589#issuecomment-172596550

clrjunkie commented 8 years ago

@benaadams

Its very much an apples and oranges comparison. The throttling in the async was achieved by limiting the connections with ServicePoint which is not simple (see FindConnection and SubmitRequest when over the limits) so queuing on threadpool vs queuing on ServicePoint is what's mostly being tested here; the threadpool being far more efficient and the threadpool is what async uses for its general use.

Maybe.. since I didn’t write the benchmark I’m reluctant to accept various explanations. However, since the author compared two approaches using out of box framework facilities one would reasonably expect a noticeable improvement in the client side test and not “because”; thus my initial remark “putting aside async theory”. The whole point of the link was to bring attention to the fact that an async api by itself says nothing about performance if the underlying implementation doesn’t deliver upon it.

Also my point had nothing particularly to do with the TechEmpower Benchmarks.

Ye’h but it kinda slipped in… The 1M rps you mentioned was linked to Damian post and as far as I understand he uses or is inspired by the TechEmpower scenarios. Know any different for this particular test? https://github.com/aspnet/benchmarks

It was that non-blocking async I/O is not slow in .NET and you aren't loosing anything in perf by comparison to the time taken by the actual I/O (similar to your raw response vs response w/ db point).

I didn’t say nor did I suggest that you are losing anything in perf by comparison to the time taken by the actual I/O. The players here are: .NET Async support, which depends on compiler generated code, threadpool implementation, o/s signaling implementation Vs. Sync support, which includes use of the Thread library and the O/S scheduler – a facility that is heavily optimized for fast context switching between many threads.

It is also hugely more scalable than (sync) blocking I/O; in fact that is one of the issue with the benchmark you linked to. It is 1M requests and the sync blocking is limited by how many requests it can process at one time as its throttled by the threadpool. One the other hand the async doesn't have that limit so it can send all the requests fairly concurrently (depending on RTT); which will probably additionally make the server unhappy; and then memory issues observed wil arise because so many responses arrive back at the same time.

See first response above.

The 12.6 GBps throughput screenshotted was from sending 300kB per request, non-pipelined, to 1000 connections which is half the page weight of for example the Angularjs or 60% of the npm homepage to pick two. So fits more with a "real world" scenario.

Nothing is “real world” with a burst test on pre setup connections, unless you consider DDOS the common scenario.

The 12.6 GBps figure doesn’t say much as it aggregates protocol level headers etc.. All I can conclude from this figure by doing a ridiculously naïve calculation is that the server processed 44040/rps of such requests which is 44/rps per connection (1KB = 1024B). What does this have to do with a Sync? Did you see if a server backed by a 1000 thread threadpool shows substantially lower earth shattering results?

Loading Angularjs front page generates 53 distinct requests on EDGE empty cache, even it reuses Keep-Alive connection I don’t see what’s the takeaway or how you correlate it to your test.

Granted if you talk to a db that will be your bottleneck, but webserver generally does a lot more than that; whether its through memory caching, serving html+img+js; streaming videos, downloading files etc

But it’s generally on the path… unless you’re after “geocities” style websites (gosh I miss those :) Caching is externalized in any serious high traffic site. I would probably stick with IIS (or NGINX) for kernel level caching and file transfer without passing through user-mode or more probably a CDN that employs dedicated solutions for streaming videos / heavy file downloads.

Anyway I think this whole performance discussion is completely deviating from the main concern of this issue which is single threadpool pressure.

benaadams commented 8 years ago

Did you see if a server backed by a 1000 thread threadpool shows substantially lower earth shattering results?

Can't currently manually increase the threadpool in coreclr https://github.com/dotnet/coreclr/issues/1606 & https://github.com/dotnet/cli/issues/889 and if you try 1000 connections the server out of the box it crawls. Obviously there aren't 1000 thread pool threads to start with; would have to jam the server for 16 min to create them.

clrjunkie commented 8 years ago

The thread pool would obviously lazy create them, when needed. The point was to guarantee a theoretical upper bound for processing. thread != connection, final outcome matters.

benaadams commented 8 years ago

Yeah, and it dies; because the I/O is inherently async and the sync-blocked threadpool threads use up all the threadpool threads in a blocked waiting state so the async I/O has nothing to run on to notify them to unblock.

benaadams commented 8 years ago

The full framework uses two threadpools to work round the issue one for i/O callbacks an one for user code.

clrjunkie commented 8 years ago

@benaadams

Yeah, and it dies; because the I/O is inherently async and the sync-blocked threadpool threads use up all the threadpool threads in a blocked waiting state so the async I/O has nothing to run on to notify them to unblock.

Implementation dependent. Look at other FW samples.

clrjunkie commented 8 years ago

@benaadams

The full framework uses two threadpools to work round the issue

Read my post addressing this.. above, above, above...

sqmgh commented 8 years ago

However, since the author compared two approaches using out of box framework facilities one would reasonably expect a noticeable improvement in the client side test and not “because”; thus my initial remark “putting aside async theory”.

Did we read the same article?

Depending on where in the article you are, the sync method doesn't actually use an out of the box method. They async method might not either; It's hard to tell because the initial implementation apparently never returned any requests. There is text talking about changing the implementation in an attempt try and get more performance, and some alluding to using less 'out-of the box' methods to do so. However, it's never noted what exactly was changed, or what implementation was actually used in further tests. The only thing that can be said for sure is that the async code that is provided is very much not best practice. Last I sawasync void will actually cause Kestrel to crash.

Yes, the initial example code does use an 'out of the box' threading method. However, the author says that the initial example does such little work on each request that it hardly uses any threads. Later in the article when the author modified the benchmark to do actual work, he had to write his own 'custom implemented parallel execution engine'. Which he eventually limited to 500 threads for 5000 total requests. 500 threads that weren't enough to keep up with the async implementation, and causes him to conclude that async gives better performance when doing actual work. If you go by the server side section, much much better.

Providing this benchmark as supporting evidence of something, and then immediately saying the techempower benchmarks are meaningless because they aren't "real world" is extremely strange to me. The initial benchmark is basically the wrk end of the techempower benchmark. Since the response that it's receiving is only "Hello world!" it's technically doing even less than the plaintext techempower benchmark is. So with that logic, and ignoring the potentially dubious implementation, I think the conclusion would be that sync might be better is unrealistic microbenchmarks. However, since that's unrealistic and web servers tend actually do work like talking to databases, then everything should be async anyways.

clrjunkie commented 8 years ago

clrjunkie wrote:

Furthermore, putting aside async theory, Windows scheduler is super optimized and I recall reading some time ago a benchamrk for ASP.NET that showed Sync request handling actually outperformed async/await requests handling, I didn't have a chance to investigate this therefore you should judge this yourself, try searching for it online.

Actually, the benchmark showed sync outperforming async in a client side test which is more relevant to this issue.

Here is the link:

http://www.ducons.com/blog/tests-and-thoughts-on-asynchronous-io-vs-multithreading

Maybe.. since I didn’t write the benchmark I’m reluctant to accept various explanations. However, since the author compared two approaches using out of box framework facilities one would reasonably expect a noticeable improvement in the client side test and not “because”; thus my initial remark “putting aside async theory”. The whole point of the link was to bring attention to the fact that an async api by itself says nothing about performance if the underlying implementation doesn’t deliver upon it.

roji commented 8 years ago

Am back from a long project, sorry for not having participated more in the conversation above.

First of all, it's somewhat disappointing to not hear anything from anyone at Microsoft on this (e.g. @CIPop).

Beyond the question of "pure async vs. sync performance" discussed above, there are other factors as I tried to say. Thread pool pressure is one of them - I may want the added predictability/determinism that sync calls provide in that they don't add a dependency on the thread pool, which may be or may not be a problem in a given application. It's a choice that should be left to the programmer rather than saying "sync I/O is always bad".

But more importantly, one of the key points of CoreCLR is portability. Saying "sync I/O is always bad" means you're sure that across Linux, MacOS etc. the kernel's sync system calls offer no advantage over whatever async mechanism the runtime does - that seems false, and should at least be shown.

CIPop commented 8 years ago

Network Programming for Microsoft Windows 2nd Ed., Table 6-3 has a comparison between all I/O models available in Windows at that time. Their conclusion was that IOCP was the best I/O model at that time while blocking calls was performing the worst.

I don't know of any architecture (hardware or software) that implements synchronous I/O when the speeds between the processing unit and the transceiver are orders of magnitude apart. If that would be true, the processing unit's computing power is wasted by either spin-waiting or powering off while waiting for the transceiver to complete the transfer.

For our discussion, the CPU is orders of magnitude faster than the network card and other processes exist in the system. There is no way to handle sync NIC/memory transfers unless the CPU waits. Transfers are managed by DMA which signals the CPU via interrupts whenever the transfer has been completed. The apparent blocking syscalls for *NIX are handled by suspending execution threads (processes) until the I/O is completed. Other platforms such as NodeJS use systems such like event loops/message pumps to dispatch completions on a single thread (this is similar in architecture with the WSAAsyncSelect).

In Windows, a thread comes with high overheads for the CPU and memory resources. The performance guideline is to have the of threads equal to the number of CPUs and no threads in waiting/sleep states. In other words: apps should always use threads (CPUs) for compute-bound operations and perform I/O operations asynchronously.

roji commented 8 years ago

@CIPop thanks for the reply.

I understand (and am aware of) what you're saying. Obviously sync I/O doesn't imply having the CPU spin-wait or power off - the thread/process is simply put into a non-runnable state until the operation completes. So the question here is not how I/O is actually handled at the lowest levels, but rather how the interaction between the application and the kernel takes place with regards to I/O.

In that context, it may make sense in certain scenarios to have independence from the thread-pool which sync I/O calls provide - you know which thread will execute "the callback", you're 100% sure it will be there waiting when the operation completes. You don't depend on all kinds of external conditions like what other libraries are operating inside the application, putting pressure on the thread pool and making things less predictable for you.

In addition, you need to take into account non-Windows platforms. AFAIK thread overheads in Linux aren't as high as in Windows, and async I/O (both underlying kernel mechanisms and implementation in .NET runtimes) isn't necessarily as widely-used or efficient as Windows. The "everything async" vs. "let's leave room for sync" debate has taken place before, and despite some current trends (and the hype around NodeJS) in some scenarios sync I/O may make sense.

Don't get me wrong, I agree that in the vast majority of cases I/O should be async. I'm just arguing that you should be leaving a choice to programmers rather than completely remove the possibility of sync I/O.

roji commented 8 years ago

@CIpop and others, I'll just give a quick concrete example. I'm the maintainer of Npgsql, the PostgreSQL driver for .NET. We've frequently received complaints that under load, and frequently at program startup, DNS queries performed by Npgsql time out. Now, the current stable Npgsql does DNS resolution with Dns.{Begin/End}GetHostAddresses. What seems to be happening, is that when the program starts the thread pool has relatively few threads. If a large amount of requests come in and a lot of async operations are in progress, the thread pool starts allocating new threads - but this process takes time; and during these time it would seem that the operation times out.

I'm mentioning this specific scenario because in CoreCLR, the sync API Dns.GetHostAddresses has been dropped, much like SslStream.AuthenticateAsClient, because "sync API is always bad".

benaadams commented 8 years ago

@roji I think that's a bug, see https://github.com/dotnet/corefx/issues/8371

roji commented 8 years ago

@benaadams see my comment https://github.com/dotnet/corefx/issues/8371#issuecomment-217705749: the risk of threadpool starvation seems like a general problem with async, not a specific problem with how DNS resolution works - that's the point I'm trying to make here (unless I'm missing something).

GSPP commented 8 years ago

The starvation issue should always be worse with sync IO running on pool threads as it is with async completion callbacks posted to the TP.

It is true that "pathologically" bad cases can happen due to scheduling etc. but the way to avoid all of those systematically is to not overload the TP.

In a pure async IO workload the only way the pool can be overloaded is when the CPU becomes saturated. That case is to be avoided as well, and it's just as troublesome with many threads performing sync IO.

What seems to be happening, is that when the program starts the thread pool has relatively few threads.

Yes, this is a significant problem with the .NET thread pool. I really wish this was addressed.

In the Dns case that you mention the root cause is the bad async implementation, not async IO (there is no async IO ATM).

sync API Dns.GetHostAddresses has been dropped, much like SslStream.AuthenticateAsClient, because "sync API is always bad".

I did not know that CoreClr has a strategy of dropping synchronous IO methods?! That would be a really bad policy. Async IO has a significant productivity cost and it is a bad default choice.

Async IO should be used when a tangible benefit can be obtained by doing that. Such benefits are saving threads and GUI programming simplification. If enough threads are available saving threads has zero customer value and negative value for productivity. A common misconception is about performance, so note that DNS queries do not become one bit faster by using async IO.

roji commented 8 years ago

@GSPP we more or less agree...

About dropping synchronous IO methods, see https://github.com/dotnet/corefx/pull/4868 where I complain about the removal of SslStream.AuthenticateAsClient(). But closer to our subject, netstandard1_3 drops the sync Dns.GetHostAddresses() - you have to use Dns.GetHostAddressesAsync() (which ironically isn't async as in dotnet/corefx#8371!!). So there's seems to be a "let's kill all sync I/O" argument in the design of CoreCLR...

However I don't agree 100% with the starvation issue always being worse with sync I/O. If you know your precise application needs and have no serious scability goals (as you say) a sync I/O program can be more predictable and less sensitive to issues like dotnet/corefx#1754. You own the threads, you have no dependency on the TP whatsoever, everything is simple.

But that's not very important - my only goal here is to make a case for leaving sync I/O methods in .NET...

GSPP commented 8 years ago

Maybe you should open a ticket dedicated to that request. This one is a discussion about many things. Make a formal request to continue to provide synchronous methods except in special cases.

Fyi, starting with .NET 4.6 (I think) the ADO.NET query methods are internally async and the sync methods are sync-over-async. Not sure if that was a good choice. Might be an OK engineering trade-off. On the other hand database calls are a classic case where async is almost always not helping throughput.

roji commented 8 years ago

@GSPP this is exactly what this issue is for (#5044)... But nobody from Microsoft seems to be taking it seriously.

Thanks for mentioning about .NET 4.6, I didn't know that! Can you point to any issue/PR where this is discussed? Any idea what's the status with CoreCLR?

roji commented 8 years ago

Also, why do you say async isn't useful for database calls? It's I/O like any other, why not make your webapp be more scalable by not blocking a thread while waiting for a database response?

GSPP commented 8 years ago

If threads are not scarce the scalability gains are zero.

DB calls are quick, almost by construction. If they are slow then the database is overloaded. The DB becomes overloaded far quicker than the web app loses its ability to provision threads.

In reporting workloads queries take a long time but there are few of them at the same time.

The number of threads consumed is equal to (latency * frequency). So 10ms query latency and 1000 queries per second (a lot!) cost you 10 threads on average which is nothing to speak of.

100s of threads under ASP.NET are not an issue, but that would already be an extreme case to observe in practice.

Can you point to any issue/PR where this is discussed?

I found out about it using Reflector on the Desktop Framework.

roji commented 8 years ago

@GSPP interesting argument. Although nowadays databases are sometimes far away geographically (e.g.another datacenter) so a lot of the latency could be network rather than database load. But I understand the general argument.

benaadams commented 8 years ago

The async "cost" is between 120ns and 10ns https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20arbitrary%20async%20returns.md

A blocking sync call is a syscall to thread de-schedule and then a reschedule when complete, not sure of latencies

roji commented 8 years ago

@benaadams in most cases there's also a memory allocation cost for the Task being returned (in some cases you can return cached instances but not always).

GSPP commented 8 years ago

@benaadams the descheduling happens with async as well iff there is not another TP item to work on. Unless the CPU is very loaded (like 90%+) the chance is very high that there is none.

Async improves throughput at very high CPU loads indeed. At lower loads it increases CPU consumption. Don't generally think of async as a way to save CPU.

Also it enables high levels of IO parallelism that otherwise would be thread constrained.

roji commented 8 years ago

Oh interesting, I didn't know about ValueTask!

JeffCyr commented 8 years ago

Here's my thoughts on the matter.

There is a valid point brought by @roji; in large and complex applications, the .Net ThreadPool is often misused by doing synchronous IO in it. This makes relying on the ThreadPool risky.

However, the root cause of the ThreadPool being misused is because the synchronous IO methods exists in the first place. If all IO methods were asynchronous, then it would be explicit that a method may be doing IO. Then the caller would be more inclined to await the returned task instead of blocking on it.

So I think that .Net Core is taking the right stance by dropping all synchronous IO methods.

GSPP commented 8 years ago

@JeffCyr your argumentation should discuss the downsides of async as well. If there were none everyone would agree.

roji commented 8 years ago

@JeffCyr I/O isn't the only reason for a method to block. In another words, even if you get rid of all the sync I/O methods, users can still start some intense computational task and block a thread.

And like @JeffCyr says, there are downsides to async, e.g. async methods usually imply increased heap allocations (because of Task instances). All I'm saying is leave the choice up to the programmer.

clrjunkie commented 8 years ago

@roji Let me give you a friendly advice:

Drop it until you can show a compelling business scenario for including I/O API’s that are not Task based.

Async API’s have one and only one technical goal: reduce the amount of context switches that would otherwise occur due to large amount of running threads servicing submission and completion of I/O request.

What’s “large” is highly subjective and scenario driven and it’s a waste of time to reason about the implications using hypothetical examples or micro optimizations. In other words, for every reasonable argument I can give a counter reasonable argument.

Async is not about fairness, and it doesn’t come without its cost, but it turns out there are common scenarios where the benefit exceeds the cost – but not always. The only sensible question to ask is whether Async API’s benefits YOU.

There are plenty of valid reason I can think of why Microsoft has chosen to go solely with Task-based I/O API’s that have little to do with high performance, for example:

There are no noticeable differences between the two for the majority of customer use cases so it’s cheaper to implement, document, and support one version.
Less arbitrary threads means you can squeeze more VM’s on average to a single physical host and by that reduce cost. (for both MS and customers)
Web Server are less susceptible to malicious slow network client attacks.
It’s easier to write await than deal with APM so the chance that an app/Winform developer will block the UI thread is reduced by virtue of not having the possibility to call sync API’s to begin with leading to better quality.
Some would appreciate the lack of choice.

6.Async sounds cool.

So again, putting it simply instead of getting frustrated ask yourself: would sync API’s really provide substantial benefit to the majority of my library users? (Is it worth the cost for you and Microsoft?)

Don’t get me wrong with this “biz-talk” I have total sympathy to your concerns, Heck I’m now investing in a completely different framework/language because it’s important to me to use HTTP components that are purely written down to the socket level in the language of my choice, but that’s my personal requirement which may very well not be applicable to others and I have come to terms with that.

clrjunkie commented 8 years ago

and stack size (implied)

roji commented 8 years ago

@clrjunkie,

Drop it until you can show a compelling business scenario for including I/O API’s that are not Task based.

This is a discussion about programming technicalities, I'm not sure what "a compelling business scenario" would consist of here, but I'll sum up again some disadvantages of async:

Considerably more complicated programming model because of the thread switching - programmers have to be careful with the synchronization context, possibilities of deadlock. The "business" scenario of sync I/O is simplified programming
Increased heap allocations (Task)
Less predictable/determinstic in large applications because of the dependency on the TP

What’s “large” is highly subjective and scenario driven and it’s a waste of time to reason about the implications using hypothetical examples or micro optimizations. In other words, for every reasonable argument I can give a counter reasonable argument.

I'm not sure what hypothetical examples or even less, micro optimizations I've referred to...

Async is not about fairness, and it doesn’t come without its cost, but it turns out there are common scenarios where the benefit exceeds the cost – but not always. The only sensible question to ask is whether Async API’s benefits YOU.

I'm confused now... I'm not saying async is bad - just that in some scenarios it's not the right choice. When you're saying that "there are some common scenarios where the benefit exceeds the cost - but not always", isn't that an argument for leaving the sync I/O option?

There are no noticeable differences between the two for the majority of customer use cases so it’s cheaper to implement, document, and support one version.

.NET is a framework, it's support to support the needs of a wide base of users. There are a lot of edge cases covered by .NET that don't meet "the majority of customer use cases" and that's a good thing. I sense something else happening here - there's a refusal to admit that async does have its issues, and that there are cases where sync I/O makes more sense.

Less arbitrary threads means you can squeeze more VM’s on average to a single physical host and by that reduce cost. (for both MS and customers)

I have no idea what that means, are TP threads somehow less arbitrary in some way?

Some would appreciate the lack of choice.

Really... maybe they would, I definitely won't. But I guess it's a question of ideology.

So again, putting it simply instead of getting frustrated ask yourself: would sync API’s really provide substantial benefit to the majority of my library users? (Is it worth the cost for you and Microsoft?)

Let me simplify things here. There's only one relevant question here: are there scenarios where sync I/O makes more sense than async I/O? If you answer "no" that's one thing - I think you're wrong but we can discuss. If you answer "yes but the majority don't care" then we should start taking a look at all the entire .NET API surface and removing rarely used APIs.

benaadams commented 8 years ago

Less arbitrary threads means you can squeeze more VM’s on average to a single physical host and by that reduce cost. (for both MS and customers)

I have no idea what that means, are TP threads somehow less arbitrary in some way?

I assume is referring to a Thread min size 1MB and Task min size 80 Bytes; also N tasks to the O/S can be 1 thread, but N threads is N threads so less work context switching and management for the O/S?

For example the website density you can achieve on a host would be directly effected by the number of threads each website was using. Or "background" running Windows Store apps etc

clrjunkie commented 8 years ago

@roji

Considerably more complicated programming model because of the thread switching - programmers have to be careful with the synchronization context

Prove it. Do you see the Internet/forums flooded with questions about this.

Increased heap allocations (Task)

So what?

Less predictable/determinstic in large applications because of the dependency on the TP

For who?

When you're saying that "there are some common scenarios where the benefit exceeds the cost - but not always", isn't that an argument for leaving the sync I/O option?

No, because I didn't include a $ nor an enterprise customer/scenario.

there's a refusal to admit that async does have its issues, and that there are cases where sync I/O makes more sense.

Don't think so.

I have no idea what that means, are TP threads somehow less arbitrary in some way?

See @benaadams response

roji commented 8 years ago

Uhh, I'm a bit lost now but I'll try to answer these in good faith.

Considerably more complicated programming model because of the thread switching - programmers have to be careful with the synchronization context

Prove it. Do you see the Internet/forums flooded with questions about this.

One issue that seems to bite people all the time is blocking on async. I'm not going to start finding popularity metrics to prove to you that this is a serious issue, but are you seriously claiming there's no added complexity because of SynchronizationContext, ConfigureAwait, thread locality and all the rest of it? Do you seriously consider async to be just a simple, entry-level language feature that doesn't require anything more from the programmer than understanding sync I/O?

Increased heap allocations (Task)

So what?

I don't know how to answer this one. C# is a language which includes unsafe, allowing you to do pointer operations for performance. We're not in Java-land where we just throw performance out the window. If you don't understand the implications of heap allocations and GC pressure go read up on it.

When you're saying that "there are some common scenarios where the benefit exceeds the cost - but not always", isn't that an argument for leaving the sync I/O option?

No, because I didn't include a $ nor an enterprise customer/scenario.

That's simply dishonest. Either there are scenarios where "the cost exceeds the benefit", or there aren't. If there are, that cost translates into $ or "$ and enterprise customer/scenario". We're programmers, we're debating a technical issue. Programmer productivity (and even heap allocations) are important.

there's a refusal to admit that async does have its issues, and that there are cases where sync I/O makes more sense.

Don't think so.

Uh, ok. You know you can't just decide that the entire world shouldn't care about increased heap allocations, right? There are people with different needs out there. You seem to be closing your eyes to any sort of objection to using async?

I have no idea what that means, are TP threads somehow less arbitrary in some way?

See @benaadams response

Frankly, I'm going to let this one go because the other arguments are much more important (I think there's just some confusion here).

JeffCyr commented 8 years ago

I get an impression that this discussion won`t come to an agreement, but here it is for the sake of the argument:

@JeffCyr I/O isn't the only reason for a method to block. In another words, even if you get rid of all the sync I/O methods, users can still start some intense computational task and block a thread.

This can be mitigated by spawning the long computational task with LongRunning option. But to be fair, from experience almost every ThreadPool starvation I debugged were caused by blocking IO and not cpu bound tasks.

And like @JeffCyr says, there are downsides to async, e.g. async methods usually imply increased heap allocations (because of Task instances). All I'm saying is leave the choice up to the programmer.

Have you measured the impact of this "increased" heap allocations? The async methods return cached completed tasks for the most common result value when completed synchronously, and even when a Task is created, it won't live past the gen0 collection which is very cheap to collect. When the operation is truly async, yes a task will be created that might make its way to gen2, but it will still be cheaper than creating/destroying a thread and keeping the stack size in memory while waiting for the blocking operation.

As for blocking on async code, while the rule of thumbs is to call it async, there is nothing wrong in blocking on async code if all await instruction in it use ConfigureAwait(false) which the .Net framework should always do. The performance impact is also negligible, many blocking IO operation are implemented in the OS by blocking the asynchronous version anyway.

roji commented 8 years ago

@JeffCyr I agree we're probably not going to agree. For what it's worth some comments on what you say specifically.

@JeffCyr I/O isn't the only reason for a method to block. In another words, even if you get rid of all the sync I/O methods, users can still start some intense computational task and block a thread.

This can be mitigated by spawning the long computational task with LongRunning option. But to be fair, from experience almost every ThreadPool starvation I debugged were caused by blocking IO and not cpu bound tasks.

Sure this kind of problem can be mitigated - whether it's computational or sync I/O. The point is that the problem is there, programmers have to deal with them, it's a complication in the programming model which you don't have in sync. Whether it's a huge problem or not is another issue.

And like @JeffCyr says, there are downsides to async, e.g. async methods usually imply increased heap allocations (because of Task instances). All I'm saying is leave the choice up to the programmer.

Have you measured the impact of this "increased" heap allocations? The async methods return cached completed tasks for the most common result value when completed synchronously, and even when a Task is created, it won't live past the gen0 collection which is very cheap to collect. When the operation is truly async, yes a task will be created that might make its way to gen2, but it will still be cheaper than creating/destroying a thread and keeping the stack size in memory while waiting for the blocking operation.

I can't measure the impact of increased heap allocation because, like everything else, it's going to depend on your application. GC pressure caused by spurious allocations are a major source of performance issues in general, enough that System.Xml itself internally uses a non-heap Task alternative (thanks @benaadams). Whether a Task instance will be cached (because it contains a common result) or whether it will make it out of gen0 is again, going to depend a lot on your application - I generally try to avoid statements like "it won't live past the gen0 collection".

Nobody has been talking about creating or destroying threads here - just one, simple thread blocking on I/O.

As for blocking on async code, while the rule of thumbs is to call it async, there is nothing wrong in blocking on async code if all await instruction in it use ConfigureAwait(false) which the .Net framework should always do.

That's true, but it's another complication which the programmer needs to understand with async. You guys are missing the point; I know how to avoid these deadlocks and performance problems, the solutions are there. It's just that programmers need to deal with these issues in async, and not in sync. That is a cost of the async model.

many blocking IO operation are implemented in the OS by blocking the asynchronous version anyway.

I'm not an expert on Windows kernel APIs, but I do know that at least in Linux sync I/O APIs are cleanly mapped to sync system calls which have been there for decades.

At the end of the day you're advocating removing a standard API that exists pretty much in every language that every existed. I'm just saying the choice should be left to the user, depending on their app.

JeffCyr commented 8 years ago

Nobody has been talking about creating or destroying threads here - just one, simple thread blocking on I/O.

I was referring to the one thread per request model, in this case yes the overhead of creating/destroying thread and keeping the stack size in memory will be greater than the "heap pressure" caused by async.

If you are talking about a single threaded application doing blocking IO, then I doubt it's an application with high CPU performance requirement, it's more about an API experience.

At the end of the day you're advocating removing a standard API that exists pretty much in every language that every existed. I'm just saying the choice should be left to the user, depending on their app.

I don't see any golang or nodejs programmers complaining about the missing blocking IO calls. .Net Core is a new platform and it chose to be an async IO platform. If you truly want blocking IO calls, then .Net Core may not be the platform you are looking for.

<speculation>By the way, while this discussion is interesting, the end result is quite futile. The Async IO choice in .Net Core have been made since its inception, Microsoft/.Net Foundation won't change their mind about this decision now.<\speculation>

roji commented 8 years ago

Nobody has been talking about creating or destroying threads here - just one, simple thread blocking on I/O. I was referring to the one thread per request model, in this case yes the overhead of creating/destroying thread and keeping the stack size in memory will be greater than the "heap pressure" caused by async.

If you are talking about a single threaded application doing blocking IO, then I doubt it's an application with high CPU performance requirement, it's more about an API experience.

Why on earth can't a single-threaded application with block I/O have high CPU performance requirements? And my application can have multiple threads which I manage manually as I see fit...

At the end of the day you're advocating removing a standard API that exists pretty much in every language that every existed. I'm just saying the choice should be left to the user, depending on their app.

I don't see any golang or nodejs programmers complaining about the missing blocking IO calls.

So what? nodejs is primarily a web development language whereas C#/.NET has a much wider scope. I don't think you'll find nodjs programmers complaining about the lack of unsafe pointer support, and yet C# includes them - do you think that should be removed as well because nodejs people aren't complaining?

.Net Core is a new platform and it chose to be an async IO platform. If you truly want blocking IO calls, then .Net Core may not be the platform you are looking for. By the way, while this discussion is interesting, the end result is quite futile. The Async IO choice in .Net Core have been made since its inception, Microsoft/.Net Foundation won't change their mind about this decision now.

Oh really? Can you point me to a place where .NET Core declares this in any official way? Even more important, can you explain why most sync I/O operations have been retained (e.g. Stream.Write())?

What seems to be happening here, is that no official decision has actually taken place, but in some specific areas of the API (e.g. Dns, SSLStream) some programmers inside Microsoft have decided to drop sync. This isn't part of a general thing in CoreCLR.

roji commented 8 years ago

@JeffCyr, are you an employee of Microsoft in some way, stating that "Microsoft/.Net Foundation won't change their mind about this decision now"?

dotnet / runtime

Discussion: advantage to using sync over async for I/O? #15963