Proposal: language support for async sequences

gafter commented 9 years ago

Both C# and VB have support for iterator methods and for async methods, but no support for a method that is both an iterator and async. Its return type would presumably be something like IObservable<T> or IAsyncEnumerable<T> (which would be like like IEnumerable but with a MoveNext returning Task<bool>).

This issue is a placeholder for a feature that needs design work.

HaloFour commented 8 years ago

Then there are the syntax issues with potentially dealing with tokens at those three places.

From the iterator side you'd obviously need a way to get at those tokens. I don't really see a great way to accomplish this without some wacky syntax or voodoo.

Probably the most voodoo-y route would be to have the generated iterator class automatically link the three potential CancellationTokens into a single token replacing the value of the token passed as a parameter to the iterator method (by overwriting the field in the enumerator during MoveNextAsync). I don't know what overhead that might entail, perhaps it would require decorating a CancellationToken parameter with an attribute.

From the consumer side I think the least messy option if you wanted to affect cancellation of an existing IAsyncEnumerable<T> (you're not calling the iterator method directly) would be a ConfigureCancellation method wrapping that instance of IAsyncEnumerable<T> passing the specified CancellationToken to GetEnumerator and then wrapping that IAsyncEnumerator<T> also passing the token to MoveNextAsync. That would remain compatible with a foreach await syntax. If the consumer wanted more control than that they could always just call GetEnumerator and/or MoveNextAsync themselves and pass arbitrary tokens to either.

paulomorgado commented 8 years ago

foreach only works with enumerables and must call GetEnumerator() on it to get the enumerator.

But what if this was legal?

foreach(var i in Range(0,10).GetEnumerator()) WriteLine(i);

Then we could have this for non-cancellable async enumerators:

foreach(async var i in AsyncRange(0,10)) WriteLine(i);

And this for cancellable async enumerators:

foreach(async var i in AsyncRange(0,10).GetAsyncEnumerator(ct)) WriteLine(i);

For a cancellable MoveNextAsync one would have to write the whole code. But I expect these cases to be less frequent that having a unique cancellation token for the enumerator.

And nothing prevents the enumerable itself to from being cancellable:

AsyncRange(0,10, ct)

jcdickinson commented 8 years ago

@jskeet I understand your concern now and completely agree. An interesting interaction here is that IAsyncEnumerable.Dispose is actually an implicit cancellation token. How does that (if at all) interact with a CancellationToken explicitly provided to the method?

jskeet commented 8 years ago

@jcdickinson Goodness knows! But I think both of these features really need to be considered closely together. I can easily imagine that a solution to one may cause issues for the other.

mgravell commented 8 years ago

IMO any discussion here should also consider ValueTask, as this is perhaps the type of scenario that might have bunches of "I have stuff without needing to actually be async right now" and have distinct return values, making pre-cached completed tasks not viable.

alrz commented 8 years ago

If IObservable<T> ever gets language support I think that would be great for events to be able to return an IObservable<(object Sender, EventArgs Args)>, just like F#. In defence of that, a language construct would be a lot better instead of a Subscribe with a bunch of lambda expressions IMO.

alrz commented 8 years ago

As for async foreach and such I'd like to suggest fork loops which act like fork-join model.

For example,

// waits in each iteration
async Task SaveAllAsync(Foo[] data) {
    foreach(var item in data) {
        await item.SaveAsync();
    }
}

// same as above
async Task SaveAllAsync(Foo[] data) {
    for(int i = 0; i < data.Length; ++i) {
        await data[i].SaveAsync();
    } 
}

// this won't wait and just run in parallel
async Task SaveAllAsync(Foo[] data) {
    fork(var foo in data) {
        await foo.SaveAsync();
    } // join
}

// same as above
async Task SaveAllAsync(Foo[] data) {
    fork(var i = 0; i < data.Length; ++i) {
        await data[i].SaveAsync();
    } // join
}

With async sequences, one might use yield return inside the loop to return a push-based collection,

async IObservable<Result> ProcessAllAsync(Foo[] data) {
    fork(var foo in data) {
        yield return await foo.ProcessAsync()
    }
}

In contrast, if you use a regular foreach it will return an IAsyncEnumerable,

async IAsyncEnumerable<Result> ProcessAllAsync(Foo[] data) {
    foreach(var foo in data) {
        yield return await foo.ProcessAsync()
    }
}

Also it should be able to be used to iterate both IObservable<T> and IAsyncEnumerable<T> collections.

scalablecory commented 8 years ago

@HaloFour I'd much rather see Parallel.* extended for TPL than add a fork keyword. Language integration gives us nothing here.

HaloFour commented 8 years ago

@scalablecory Wrong mention. :smile:

@alrz I like the idea. If async methods could return observable sequences I think it would be pretty useful. However, I also worry that it might be a little too easy and tempt people into using it without understanding the ramifications of having multiple threads silently spawned within their iterative-looking code. All of a sudden all of the nuances of synchronization and thread-safety applies to locals (yes, I am aware that this is true today with closures, but that does involve a little more opt-in.) As a new keyword it could be possible that limitations would be imposed such that variables declared outside of the block could not be modified, but that would differ from any other semantics in C#.

alrz commented 8 years ago

@HaloFour "having multiple threads silently spawned within their iterative-looking code." There can be no thread actually, it depends on the implementation of the awaited method but the point is that it doesn't wait for each await in each iteration, you can think of it as a list of running tasks and a Task.WhenAll but a little nicer, it provides language support for processing tasks as they compete which is a common pattern, I guess.

As for synchronization, we can wait for all the tasks in parallel but process them in order, meaning that a completed task might be waiting for another iteration to get to an await or end of the loop block, this makes sense because all of them will join at the end of the loop. Similarly, it doesn't "move next" until it gets to an await. But not allowing modifications might be limiting and eventually not really sufficient, one might say in a lock block once again you can modify variables and so on.

PS: On the second thought (after opening the issue actually) I just convinced that language support doesn't really buy much,

fork(var item in new[] { 2, 3, 1 }) {
    await Task.Delay(item*1000);
    Console.WriteLine(item);
}

// easy peasy

await Task.WhenAll(new[] { 2, 3, 1 }.Select(async item => {
    await Task.Delay(item*1000);
    Console.WriteLine(item);
}));

and also it would be worse because the loop hides a task and doesn't support cancellation etc.

HaloFour commented 8 years ago

@alrz If it's akin to Task.WhenAll then the default (and expected) implementation would be that more than one of those operations are executing concurrently, so there would definitely be threads involved. Without that there is absolutely no difference between fork and await foreach.

alrz commented 8 years ago

@HaloFour However, my example satisfies the first rule of fork, "it doesn't "move next" until it gets to an await" but not the second, "a completed task might be waiting for another iteration to get to an await or end of the loop" so synchronization is not guaranteed.

The problem with await foreach or await using (#114) is that await applies to an expression — the Task, which mustn't be ignored. with these constructs there is no reasonable way to have ConfigureAwait, CancellationToken, etc. And that's ok because these are not related to the language, await is not specific to Task, so tying foreach to these types is not a good idea IMO.

HaloFour commented 8 years ago

@alrz await expressions do not have to be Tasks, they can be anything that correctly follows the awaitable convention. await has no concept of cancellation tokens and ConfigureAwait just happens to be an instance method of Task. For example, await Task.Yield() is legal, but Task.Yield() returns a YieldAwaiter, not a Task, and there is no way to either cancel nor configure the return of that method.

Note that I do think that both cancellation and configuration are useful. I think that offering extension methods off of IAsyncEnumerable<T> should satisfy the requirement:

IAsyncEnumerable<int> sequence = GetSequenceAsync(1, 10, CancellationToken.None);

CancellationTokenSource cts = new CancellationTokenSource();
cts.CancelAfter(500);
foreach (var number in sequence.ConfigureAwait(false).WithCancellationToken(cts.Token)) {
    Console.WriteLine(number);
}

alrz commented 8 years ago

@HaloFour Yeah I know, exactly my point, anyway. As it turns out, Task.WhenAll is not safe when you have shared state, that's where fork can be helpful.

omariom commented 8 years ago

I think CoreFxLab's Channels are closely related to the subject.

HaloFour commented 8 years ago

@omariom Neat. I wonder how it differs from Rx and why there is an effort to seemingly duplicate that library. Can't be NIH, they invented it. @stephentoub ?

i3arnon commented 8 years ago

@HaloFour Rx is for push-based reactive streams. Channels are pull-based and more similar to TPL Dataflow blocks (though a bit lower on the abstraction stack and less opinionated)

ljw1004 commented 8 years ago

I put together a proposal for async iterators here: https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20async%20iterators.md

Allows to "async-foreach" over an async enumerator, as suggested by @paulomorgado
Puts the cancellation token into the call to GetEnumerator rather than MoveNextAsync
Uses IAsyncEnumerable.ConfigureAwait / IAsyncEnumerator.ConfigureAwait as suggested in LDM #5383
Consumer is completely pattern-based to support ValueTask if needed
Producer is completely pattern-based to allow an async iterator method to return IObservable<T> or IAsyncActionWithProgress<T> or IAsyncEnumerable<T> or IAsyncEnumerator<T>
A novel mechanism for getting hold of cancellation token from within the body of the async method. I've used async contextual keyword to refer to the current instance of an async method, similar to this and base. This mechanism is useful more generally for an async method or async iterator method to interact with its builder (equivalently, the object that it returns). Windows tasks like IAsyncAction can use it to handle IAsyncAction.Cancel() or IAsyncActionWithProgress.OnProgress(). I believe that IObservable would use it too, but need to learn more.

MgSam commented 8 years ago

I really like the new proposal. I think it handles the CancellationToken problem really nicely. Adding an example to the spec of how the async contextual keyword would work for your standard Task-returning async method would be useful as well. Does this mean there would no longer be a need for endless overloads in our async methods that accept CancellationToken?

I prefer the foreach (await var x in asyncStream) { ... } syntax, as it is the most consistent with how we await asynchronous tasks elsewhere currently. It feels weird to put the word async in there when what is happening is an await.

I think the iterator modifier section needs expansion, as it is in the title, but the text doesn't actually ever mention an iterator modifier. I strongly am in favor of adding an optional iterator keyword that can be added to the method signatures of normal iterators and async iterators. People who don't want it don't have to use it, but those of us that do can set style rules so that an analyzer and quick fix can make all the methods consistent.

It is much more useful to have the method signature document what kind of method it is, rather than having to scan through the entire body to guess whether its an iterator or not. All that being said, this is a minor point that is probably separable from the rest of the proposal given that it is contentious.

Overall, really looking forward to see a prototype of this feature!

bbarry commented 8 years ago

// EXAMPLE 7:
// async C f()
// {
//   var ct = async.CancellationToken;
//   await Task.Delay(1, ct);
//   yield 1;
// }
// expands to this:

I think async.CancellationToken could be problematic in that async could be a variable or type in scope inside an async method already. await.CancellationToken seems available though...

MgSam commented 8 years ago

@bbarry Thanks. Missed the comment there.

alrz commented 8 years ago

Is this going to use the same shape of IEnumerator<T> i.e. MoveNextAsync and Current? Wouldn't it make sense to make it similar to what F# does for its AsyncSeq?

type IAsyncEnumerator<'T> =
    abstract MoveNext : unit -> Async<'T option>
    inherit IDisposable

ljw1004 commented 8 years ago

@bbarry I suggested that the "async" contextual keyword is only a contextual keyword in async methods that return a tasklike other than void and Task and Task<T>. There are no such methods today. In methods where it's a contextual keyword, then by definition it will never refer to a variable or type in scope, and there won't be a back-compat break. But you're right to bring up await as another possibility. I guess a prototype could include both options to see which one people like.

@alrz I think Async<'T option> would be a bad idea for C# because (1) option doesn't exist, (2) it would be prohibitively expensive, causing a heap allocation for every element of the sequence. By contrast Task<bool> never requires a heap allocation for an already-completed task, since Task.FromResult(true) and Task.FromResult(false) are both pre-allocated static singletons in the BCL.

MgSam commented 8 years ago

@ljw1004 Why not make contextual async a possibility for existing Task-returning async methods as well? As I mentioned above, it is annoying code bloat to make CancellationToken overloads for every single async method. First class language support for supporting cancellation would be a huge improvement.

The backwards compat issue doesn't seem major. If you're foolish enough to name a type or local variable async then the compiler can report an error, and you'll fix the error when you upgrade. The benefit seems worth the cost, IMO.

ljw1004 commented 8 years ago

@MgSam could you spell precisely out how you see this mechanism being able to avoid CancellationToken overloads? I can't see it...

alrz commented 8 years ago

@ljw1004 I'm talking about two interface invocations that is required for every element. However, it could be something like this:

interface IAsyncEnumerator<T> {
  Task<bool> TryGetNextAsync(out T value);
}

But since out is actually ref in CLR, the T cannot be defined as covariant.

interface IAsyncEnumerator<out T> {
  ITask<T?> GetNextAsync();
}

This one works as long as we could use ? on unconstrained generics (#9932) and we have a covariant ITask<T> interface.

Anyhow, since async has its own overheads, these micro optimizations probably aren't much important.

MgSam commented 8 years ago

I'd imagine you'd need a slightly different calling convention.

async Task Foo()
{
    await Task.Delay(1000);
    if(async.CancellationToken.IsCancellationRequested) return;
    await Task.Delay(1000);
}

var ct = default(CancellationToken);
await Foo(), ct;

HaloFour commented 8 years ago

@MgSam

What exactly do you propose that translate into? Have the compiler automatically generate overloads? Or automatically add optional parameters? Stuff the CancellationToken into a thread-local? All of those options sound pretty nasty. I just add a default CancellationToken parameter and not worry about overloads.

bbarry commented 8 years ago

@arlz The interface used here is a "pattern" like how foreach works elsewhere. You wouldn't have to actually implement IAsyncEnumerable<T> in order to make it work (I think example 6 is confusing though).

It looks like this type would satisfy enough requirements to work inside a foreach:

class Foo
{
  ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
  int Current {get { ... } }
}

used as:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  foreach(async var i in f)
  {
    result+=i;
  }
  return result;
}

compiling as if it was:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  {
    try
    {
      while(await f.MoveNextAsync())
      {
        int i = f.Current;
        result+=i;
      }
    }
    finally
    {
      (f as IDisposable).Dispose();
    }
  }
  return result;
}

(and either I am reading ex 6 wrong or it is disposing something it probably shouldn't)

ljw1004 commented 8 years ago

@bbarry My intention is that async enumerators must all implement IDisposable.

It sounds like you find that odd. Here's another alternative1) if you foreach over a enumerable, then the foreach statement has acquired a resource and must dispose of it; (2) if you foreach over an enumerator then you're the one who acquired it and you're the one who must release it...

foreach (async var x in enumerable) { ... }
// becomes
using (var enumerator = enumerable.GetEnumerator())
    foreach (async var x in enumerator) { ... }

foreach (async var x in enumerator) { ... }
// becomes
while (await f.MoveNextAsync()) {var x = f.Current; ... }

But actually that feels problematic also. It would mean that these two statements have very different disposal semantics:

foreach (var x in enumerable) { ... }
foreach (var x in enumerable.GetEnumerator(cts.Token)) { ... }

It would also mean that folks who write async enumerator methods (rather than async enumerable) would have to expect that realistically most callers would never end up disposing of them. They'd instead just foreach over them as is easiest.

I wonder if Dispose() even has any role to play? Why would async enumerators have IDisposable rather than some hypothetical IAsyncDisposable? Maybe it would be cleaner to cut out the call to Dispose entirely?

MgSam commented 8 years ago

@HaloFour Generating an overload seems the most straightforward because it guarantees backwards compat but I haven't thought carefully about every edge case.

I also just add a default value for the CancellationToken, but very often people forget to do this, and once they've forgotten, it breaks the cancellation chain. It makes it impossible to cancel the method and any async methods that method itself calls, even if they do support cancellation. And, as you know, optional parameters are problematic in public interfaces.

Manually adding the cancellation parameter is also a ton of extra code bloat on every single async method (which often is most of them, nowadays).

Having it just be in context of every async method would both make it consistent with this new feature and allow methods to more strictly focus on business logic rather than retyping this plumbing over and over again. Combined with an analyzer that warns you if you forget to pass an existing cancellation token to any async method you call, and you'd get a big win in cancellation support over current state, where most code written doesn't use it.

bbarry commented 8 years ago

I think it is odd that you are disposing something that you potentially don't own.

You cannot today foreach over an IEnumerator, only an IEnumerable (and similar constructs). If you could I would think it should not be a compiler task to dispose of it as part of the foreach statement. (passing IEnumerator-like types around that are designed to be mutated seems smelly in the first place, but that is another point I suppose)

I think the pattern might be better if it were:

interface IAsyncEnumerable<T>
{
   IAsyncEnumerator<T> GetEnumerator();
   IAsyncEnumerable<T> ConfigureAwait(bool b);
   IAsyncEnumerable<T> RegisterCancellationToken(CancellationToken cancel = default(CancellationToken));
}

and so

class Foo
{
  Foo GetEnumerator() { return this; }
  ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
  int Current {get { ... } }
}

with the same usage method compiling as if it was:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  {
    Foo e = null;
    try
    {
      e = f.GetEnumerator();
      while(await e.MoveNextAsync())
      {
        int i = e.Current;
        result+=i;
      }
    }
    finally
    {
      (e as IDisposable).Dispose();
    }
  }
  return result;
}

HaloFour commented 8 years ago

@bbarry @ljw1004

What about having extension methods for ConfigureAwait and RegisterCancellationToken rather than having the IAsyncEnumerable<T> interface have to implement methods for either?

bbarry commented 8 years ago

@HaloFour the IAsyncEnumerable<T> interface doesn't actually have to be implemented at all (it actually can't exist per se in the general form used by the foreach statement because the return type from GetEnumerator() is the pattern IAsyncEnumerator<T>, not any actual type on its own...). It exists for the purpose of the spec conversation here as the interface methods the state machine would generate in order to support this pattern.

divega commented 8 years ago

@ljw1004 re the CancellationToken passed to GetEnumerator() instead of MoveNextAsync(): This has always been my preference, however the following was actually one of the strong arguments for having it on MoveNextAsync() when I discussed it years ago with the RX folks:

This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that.

Copy & paste error or did I miss what you meant? :smile:

ljw1004 commented 8 years ago

@divega I explained badly. Let me rephrase:

Some folks suggest that each individual call to MoveNextAsync should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the async iterator method.

It's also weird because on the consumer side, in the async-foreach construct, there's no good place for the user to write a cancellation token that will go into each MoveNextAsync: from the perspective of the users, they don't even see the existence of MoveNextAsync.

Also: it would be weird to attempt to write an async iterator method where the cancellation-token that you want to listen to gets changed out under your feet every time you do a yield:

// Here I'm tring to write an async iterator that can handle a fresh CancellationToken
// from each MoveNextAsync...
async IAsyncEnumerable<int> f()
{
   var cts = CancellationTokenSource.CreateLinkedTokenSource(async.CancellationToken);
   var t1 = Task.Delay(100, cts.Token);
   var t2 = Task.Delay(100, cts.Token);
   await t1;
   yield 1;

   // ??? at this point can I even trust "cts" any more?
   // or do I need to create a new linked token source for further work that I do now?

   yield 2;

   await t2;
   // ?? is this even valid anymore, given that it's using an outdated cancellation token?
   // If someone cancels that latest freshest token passed in the latest MoveNextAsync,
   // how will that help cancel the existing outstanding tasks?
}

divega commented 8 years ago

from the perspective of users, the async method they see is the async iterator method.

Ah, I get what you mean now. Worth clarifying that is from the perspective of foreach consumers IMO because in the underlying methods things are the other way around, i.e. GetEnumerator() is not async.

I agree with everything else although it seems to be a bit unfortunate that you have to call GetEnumerator(ct) explicitly in async foreach. Ideally users shouldn't need to know about GetEnumerator() either, like they don't need to know when using non-async foreach. Any chance we could get language sugar for passing the CancellationToken? E.g.

foreach(await var o in asyncCollection using ct)

BTW, foreach(await ... looks good, but I suggest also considering await foreach for the list of possible syntax.

scalablecory commented 8 years ago

Some folks suggest that each individual call to MoveNextAsync should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the async iterator method.

@ljw1004 Remember, MoveNextAsync() is the async method here. In .NET, you pass the token to the thing doing the work, which is not GetEnumerator().

I think people are getting hung up here in that this is both a .NET add and a C# add. Those two things shouldn't compromise each-other. IAsyncEnumerable needs to remain intuitive and consistent with the rest of .NET, because it won't always be used with C# sugar on top.

Look at the very similar DbDataReader class.

tumtumtum commented 8 years ago

Throwing my two cents in here but has this syntax been considered?

foreach(var item in await collection) 
{
}

IMO it would make sense in C++ too:

foreach(auto item : await collection)
{
}

Without the await the code reads as: "For each item that has been retrieved from the collection's values" With the await the code reads as: "For each item that has been retrieved from the collection by awaiting the collection for values"

Or alternatively, we have async and await -- why not introduce an awaitin keyword?

foreach(var item awaitin collection) 
{
}

divega commented 8 years ago

Re

foreach(var item in await collection) 
{
}

That is what you would write if collection was a Task<IEnumerable<T>> (or similar awaitable type). The distinction that you may need to await for each element is important.

ljw1004 commented 8 years ago

@bbarry Oh my! This IDisposable stuff is crucial. Let me try to lay down step-by-step the issues and their consequences...

The Dispose question

When you write an async method with try/finally blocks, you certainly expect the finally blocks to execute. Everything in the language+libraries should be geared to support this.
Therefore the enumerator must implement IDisposable (which is the only way that an object can indicate that it needs some final action to be run), and the foreach must do a dispose.
If the async foreach construct is able to consume enumerables then of course it calls GetEnumerator() and then calls Dispose() on the enumerator. Nothing new here.
If the async foreach construct is able to consume enumerators then it must also call Dispose() on the enumerator. This is unusual -- because foreach doesn't look like using, and it's strange that foreach disposes of something it didn't acquire. But it's necessary. Because if we don't do it, then we'll push developers into a pit of failure where they typically fail to dispose of enumerators, and hence fail to execute the finally blocks in async methods. For instance:

// This is a common and easy idiom. We need to make sure it calls Dispose().
foreach (async var x in GetStream()) {
   if (x == 15) break;
}

// It would be ugly boilerplate if the user instead was expected to manually wrap
// each foreach inside a using:
using (var enumerator = GetStream()) {
  foreach (async var x in enumerator) {
    if (x == 15) break;
  }
}

There is one saving grace. The meaning of an enumerator is that it can be consumed exactly once. So if you async foreach over it once, then no one can ever foreach over it again. Therefore it doesn't matter that you've disposed of it.

The CancellationToken question

There are three places at which cancellation tokens might be communicated:

At every MoveNextAsync(token).
- This has two killer disadvantages: it's too hard to write an async iterator method which has to deal with a different token after every yield; and there's no way in the async foreach construct to provide a different token after each one.
With every enumerator
- it might be passed in the call to an async iterator method itself GetStream(token) if async iterators can return enumerators
- it might be passed in the call to GetEnumerator(token) if you have an enumerable in hand
With every enumerable
- it would be passed to the async iterator method `GetStreamFactory(token) if async iterators can return enumerables.

I'd initially discounted option 3 because I thought that everyone who kicks off a stream deserves to be able to cancel the stream themselves; because it felt weird that you could cancel it once and then everyone who tries to get the enumeration in future will find it cancelled without really knowing why. It means that the enumerable isn't a true pure factory of independent identical streams.

But actually, IEnumerables today aren't true factories either. Consider:

class Baby
{
  private List<string> poops = new List<string> { "morning", "noon", "night" };
  public void DoPoop(string s) => poops.Add(s);
  public IEnumerable<string> GetPoops(string prefix) => poops.Where(p => p.StartsWith(prefix));
}

var b = new Baby();
var p = b.GetPoops("n");
foreach (var s in p) Console.WriteLine(s); // "noon, night"
p.DoPoop("now!!!");
foreach (var s in p) Console.WriteLine(s); // "noon, night, now!!!"

(My wife gave me a 30 minute break from baby-care duties so I could write up these notes...)

Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.

Design options

In the light of the two points above, let's talk through the design options again...

Option1: Streams (async enumerators) are foremost. An async iterator method can only produce enumerators like IAsyncEnumerator<T>, not enumerables. The async foreach construct can only consume enumerators, not enumerables.
- Advantage: cancellation has a clear and obvious meaning, no trickery involved. You pass a cancellation token to the async iterator method, end of story.
- Disadvantage: async foreach has to dispose of the enumerator, which feels weird, but at least has the "saving grace" that the stream could in any case never be re-used.
Option2: Async enumerables are foremost. An async iterator method can produce enumerators and enumerables. The async foreach construct can only consume enumerables; it does so by calling GetEnumerator() and then disposing of the enumerator. There is no special support for cancellation, so folks have to cancel at the level of the enumerable not the enumerator.
- Advantage: this is the option most in keeping with the existing language. It also treats Dispose very cleanly.
- Disadvantage: it has a sort of "nod and a wink" that even though folks will typically use the feature for one-off streams that are produced once and consumed once, it nevertheless wraps them up in the form of "stream factories" that aren't pure factories of identical streams. The ubiquity of cancellation tokens means you'll never achieve pure factories.
Option3: Enumerators and enumerables are both equally foremost. An async iterator method can produce either. The async foreach construct can symmetrically consume either, and calls Dispose no matter which one it's given. Cancellation is done at the level of the enumerator by leveraging the novel async contextual keyword within async methods.
- Key to this option is the symmetry of async foreach. That's what allows you to either consume foreach (async var x in GetStreamFactory()) or foreach (async var x in GetStreamFactory().GetEnumerator(token)), and not have to think about it.
- Advantage: this option feels the best match for the inherent nature of streams and factories. Also the async contextual keyword is useful for other things such as IAsyncAction. I conjecture it might also be needed for IObservable but don't yet know.
- Disadvantage: Just like Option1 it feels goofy that async foreach (enumerator) disposes of the enumerator when it's done -- but at least like Option1 it still has the saving grace that the enumerator could never be re-used anyway so it doesn't matter. The main disadvantage is that the new async contextual keyword is a heavyweight addition to the language.

At the moment, I think Option2 has the best chance of getting into the language. I think I should prototype Option3 since it's a strict superset of the other options. That'll allow for some good experiments.

divega commented 8 years ago

@ljw1004 sorry if I am being repetitive... Could we have option 2 but with a twist: that async foreach adds optional syntax that accepts a CancellationToken to be passed to the GeEnumerator() call? Then a cancellation would only affect that particular enumerator but users wouldn't need to call GeEnumerator(cancellationToken) themselves.

ljw1004 commented 8 years ago

@divega why not do that in the library? Why not have a function

interface IAsyncEnumerable<T>
{
   IAsyncEnumerable<T> WithCancellation(CancellationToken c);
}

with the meaning that every enumerator which is subsequently gotten from this new IAsyncEnumerable will end up using the cancellation token? In the typical idiom, async foreach (var x in f.WithCancellation(c)), there'll only be a single enumerator gotten from it in any case...

tumtumtum commented 8 years ago

Yeah sorry I did edit but not sure why it didn't get saved.

How about awaitin?

foreach(var item await in collection) 
{
}

async, await and awaitin

HaloFour commented 8 years ago

@ljw1004

Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.

I think of cancellation like any other form of composition allowed in LINQ, like TakeWhile, and in that case it does make sense for whatever method this is to return a composed IAsyncEnumerable<T>. Then I could compose the cancellation into the sequence and hand it off to some other method to enumerate over without that method having to be aware and without that method needing an overload for IAsyncEnumerator<T>.

I think that this design would also allow the CancellationToken argument to be moved back to the MoveNextAsync method which would make it more consistent with async method design. The standard enumeration emitted by C# with foreach could pass CancellationToken.None, still leaving room for exploring the potential of having some other way of providing that token in the language. The IAsyncEnumerator<T> provided by the decorated IAsyncEnumerable<T> could then replace that with the token provided in the WithCancellation method, or merge them if they're both cancellable, or whatever.

divega commented 8 years ago

@ljw1004

why not do that in the library? Why not have a function

That of course would work, but it may not be as nice as it could be. See, unlike with async iterators, we can already do a decent job for async foreach completely in the libraries, e.g. for current incarnations of IAsyncEnumerable<T> we have extension methods that go more or less like this:

public static async Task ForEachAsync<T>(
    this IAsyncEnumerable<T> source, 
    Action<T> action, 
    CancellationToken cancellationToken = default(CancellationToken)) 
{     
    CheckNotNull(source, nameof(source)); 
    CheckNotNull(action, nameof(action)); 

    using(var enumerator = source.GetEnumerator())
    {
        while (await enumerator.MoveNextAsync(cancellationToken)) {
            action(enumerator.Current);
        }
    }    
}

Besides the obvious extra method and delegate I happen to regard async foreach integration in the language mostly as sugar that can make IAsyncEnumerable<T> much nicer to consume. So if it is language sugar and if cancellation is such an important feature of async, why not take it the full way?

BTW, I am very sympathetic with any resistance to add arbitrary things to the language, e.g. adding a language feature that is aware of CancellationToken (although perhaps it could be just about passing any state to the GetEnumerator() method?) is perhaps a leap we haven't taken before and also having to pick among new and existing keywords and patterns for this starts to sounds very expensive. But I just wonder if it is worth doing it in this case.

bbarry commented 8 years ago

(although perhaps it could be just about passing any state to the GetEnumerator() method?)

Considering IEnumerable to be an IEnumerator factory (which 99% of the time in normal workflow code is only ever asked to create a single instance), I think it is more about enabling IEnumerable to generate the IEnumerator instance with the correct state.

Perhaps, :spaghetti: (pretend IEnumerable/IEnumerator didn't matter as an interface but as a pattern), it is really about some arbitrary parameter list to the .MoveNext method:

foreach (var x in enumerable) (state, ...)
{
  ...
}

translates that parameter list to the MoveNext method call (as a completely orthogonal feature to async foreach)?

It would then be reasonably intuitive for a syntax:

var cancellationToken = ...;
await foreach(var x in asyncenumerable) (cancellationToken)
{
  ...
}

(you could reassign cancellationToken here if you so desired a new one for the next MoveNext inside the block for the foreach statement)

ljw1004 commented 8 years ago

@HaloFour If we do move cancellation token into MoveNextAsync(...), could you try writing the body of an async enumerator or enumerable which takes advantage of it? and which handles the case where a different token gets passed in each time? I'd love to see a concrete example. I persuaded myself it can't be done.

HaloFour commented 8 years ago

@ljw1004

Admittedly that's a little tricky and there probably isn't a method that doesn't involve some degree of voodoo. You either need some kind of "ambient" token, like your async.CancellationToken concept or something buried in the framework like a CancellationToken.Current or you need something that could wire it up to any CancellationToken defined as a parameter. You get this problem regardless of where the cancellation token comes from, though, unless the only place you can provide one is in the call to the async iterator directly.

The last option is very voodoo-y. 🍝 In short, the async iterator state machine would take any CancellationToken parameter and instead of using it as is would create and manage its own CancellationTokenSource and pass along that token to the iterator method. Then the state machine would wire up each token provided on MoveNextAsync so that if it is cancelled or becomes cancelled that it cancels the token source which would in turn cancel the token in the iterator. I don't know if the overhead associated with something like that would be too excessive.

scalablecory commented 8 years ago

Yes, I think registering each token is the only way it could be done. Unfortunately the common path of CancellationTokenSource.InternalRegister is non-trivial, not the kind of overhead I'd love to have on every loop iteration.

dotnet / roslyn