Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Enhancement: Additional select alternative for terminating entry call #91

Open joshua-c-fletcher opened 3 months ago

joshua-c-fletcher commented 3 months ago

Section 9.7 "Select Statements" of the LRM describes four possible select alternatives:

I believe there should be another representing a 'terminating_entry_call' that would look very much like a timed_entry_call, but with an "or terminate" option instead of a delay_alternative:

terminating_entry_call ::= select entry_call_alternative or terminate_alternative end select;

The description of the terminate alternative wouldn't need to be changed, since the description still fits (from 9.7.1 and 9.3):

An open terminate alternative is selected if the conditions stated at the end of clause 9.3 are satisfied: (in particular) Completion of a task (and the corresponding task_body can occur when the task is blocked at a select_statement with an open terminate_alternative (see 9.7.1); the open terminate_alternative is selected if and only if the following conditions are satisfied:

When both conditions are satisfied, the task considered becomes completed, together with all tasks that depend on the master considered that are not yet completed.

This construct would be very useful in cases where a task simply needs to dequeue items and do some processing on items as after they are dequeued, but which should also terminate with a program (when all other tasks are complete). Such tasks don't generally need entries, but one either has to: a) add an entry for a rendezvous so that an "or terminate" option can be provided as part of a selective_accept (such as adding a rendezvous when adding something to the queue, to notify the task that it can go into a nested loop for dequeueing stuff)... but that makes such calls ineligible for calling from a protected procedure (without a bounded error). b) provide a "special" queue item that the task recognizes as a message to stop processing and shut down when the special item is dequeued in turn. c) add a special shutdown state to the protected object representing the queue that the task can recognize when it gets 'no item' from the queue (this is similar to the special queue item, but could shortcut items in the queue) d) do a timed_entry call on the queue to periodically check a shutdown condition. e) add Shutdown rendezvous, which also requires doing a timed_entry_call on the queue object so the possibility of a shutdown rendezvous can be checked periodically.

This construct could make it easier to implement the task in Ada.Real_Time.Timing_Events, or something similar, which, in GNAT's implementation uses an implementation-defined call "System.Tasking.Utilities.Make_Independent" to allow the environment task to terminate the program without waiting for the (otherwise) interminable Timer task to finish. GNAT doesn't use a protected object for the queue, but with an 'or terminate' option available, it could be implemented with one.

Similarly, a logging process where "logging is done to memory and a logging process transfers that memory to some log file asynchronously to the actual logging" could use this. One might choose to have a responsible task complete the logging in a controlled shutdown and close the output file. However, one could also open (in append mode), write to, and close the file each time when holding a log event - the preferred choice depending on some context. Alternately a program could log entirely to the console (with no need to explicitly open and close files), with the program's caller piping the output to file as needed, or use some inter-process communication the send the queued items to another process, which may not necessarily need anything to shut them down. In the latter three cases, the log events could just be added to a queue, with a task pulling them off and logging (or sending them) them until all other tasks are either complete or waiting on a terminate alternative. With the "or terminate" option, the task could terminate gracefully (and automatically) once no other task is able to add more log entries to the queue to be logged.

It would make sense to me for this construct to raise either a Program_Error or a Tasking_Error if used inside a rendezvous, but other uses should be fine, including when called from a subprogram that isn't part of a specific task, since it will always be in a task context when called.

If a task is waiting on an entry call of protected object, but all other tasks are either complete or waiting on a select statement with an open terminate alternative... then there's nothing out there that will change the entry conditions of that entry call; it will be blocked forever unless it has a terminate alternative of its own.

If we consider nested tasks with different masters, then there might still be tasks that don't depend on the same master that could still affect the protected object and allow entry for that task that is waiting on entry, but I think it would be enough to acknowledge that detail with a note.

If using an 'or terminate' option with an entry call, we'd just follow the same rules as the 'or terminate' option on a selective_accept.

As stated, it could improve the implementation of LRM-defined packages (like Ada.Real_Time.Timing_Events) as well as significantly improving the situation for tasks that manage queues in general, and likely other entry calls (that aren't necessarily for queues), allowing for more graceful (and automatic) task and program termination, at the right time, without extra application-level overhead.

Richard-Wai commented 3 months ago

I definitely agree with your pain-points and have found myself wanting this exact feature myself.

There are some subtleties to consider that make this a bit less strait-forward in practice. From the programmer's perspective this seems obvious, and obviously similar to the existing selective_accept. However the implementation (or from the implementor's perspective) is extremely different.

In reality, a selective_accept with a terminate alternative is really manifested as an extra implicit accept alternative that is only "callable" by the runtime, and allows the task to be terminated at the completion of its master. This is different than the ATC approach of aborting an entry call

The key issue has to do with the implications of Asynchronous Transfer of Control (ATC), specifically the idea of "cancelling" or "aborting" an entry call. The key distinction is that cancellation/abortion is not guaranteed, and if an entry call can't be cancelled, it is completed. So if the master is trying to terminate such a task, and cancellation fails, then what happens? There is implicitly the possibility of a sort of race condition whenever we attempt to cancel an entry call, hence why all select statements that allow the cancellation of entry calls ensure that there is a very clear specification of what happens if cancellation is successful vs if the call completes. I think that would be hard to do in this case, and may be the reason why this feature doesn't already exist.

sttaft commented 3 months ago

Interesting. At least seems worth discussing in more depth, ideally with some Ada run-time experts involved (e.g. Pat Rogers and Pat Bernardi from AdaCore).

Richard-Wai commented 3 months ago

Thinking more on this, I think it might be possible for us to just say something to the effect of:

"On completion of the master, if the an entry_call_alternative of a terminating_entry_call has not completed, an attempt is made to cancel the entry call. If the entry call is canceled, the task terminates. If the entry call is not canceled, it is completed, and the select statement completes normally. If the task enters a terminating_entry_call after the master has completed, the task terminates immediately and does not execute the entry_call_alternative"

Ergo in such a "terminating_entry_call", if used as intended, such a queue, this should always work. In the case where the queue is still receiving items, we won't just drop the dequeued item. Better yet, if this terminating_entry_call appears in a loop, it will detect the terminating condition immediately.

joshua-c-fletcher commented 3 months ago

It is very different from ATC. A terminating_entry_call would never abandon the execution of an entry that it is already executing, or the sequence of statements that can be part of the entry_call_alternative.

In my expectation, the terminate_alternative would only be selected if the task was blocked at the select statement, waiting for entry where the conditions for entry were not yet met. That is consistent with the existing statement in 9.3 that refers to "Each task that depends on the master" being "already terminated or similarly blocked at a select_statement with an open terminate_alternative"

Supposing a program includes:

If the first task populates, say, 100 items in the queue and then completes, I would expect that the other task would still process all 100 items in the queue, even if the first task completes when the other has only processed a few of the items.... because the dequeue call wouldn't actually block at the select until the queue was empty. When the queue becomes empty, the task would block at the select statement, and since the conditions for termination are met (the other task is terminated, and the master is complete), it would immediately cancel the entry call and select the terminate_alternative. If the master was the environment task and these two tasks were the only additional tasks in the program, then the program would end after processing the last item in the queue.

Similarly, suppose the task that dequeues and processes items was a task type, and instead of a single task, there was an array of them: now we have several similar tasks all pulling items off the queue and processing them in parallel.

When one of the tasks dequeues the last item in the queue, the other tasks will soon find themselves blocked at the select statement waiting for items in the empty queue (that will never be added). They can't terminate yet, because the task that dequeued the last item (which has the same master) is still processing it and is not yet similarly blocked. When it finishes processing the last item and comes back around to the select statement, they'll all be similarly blocked, and the terminate alternative will be selected for all of them.

This agrees with what you (Richard) said here: "In the case where the queue is still receiving items, we won't just drop the dequeued item."

I'm also expecting it would continue to dequeue items if there are more to dequeue, since it won't be blocked at the select until the queue is empty. (and if the first task were slower at adding items to the list, and thus is still active, the terminate alternative won't be selected (yet) because a task with the same master isn't terminated yet.)

This is consistent with the language about the terminate_alternative that already exists in the LRM, except for the new option of using a terminate_alternative in the same select statement as an entry_call_alternative, and not only with accept_call_alternatives.

The idea isn't to abort or cancel any behaviour that might be useful. Rather, we're simply allowing the tasks (and ultimately the program) to terminate gracefully when the tasks cannot possibly have anything more to do. In this case, the tasks can terminate when the queue is empty, because there aren't any tasks left with the same master to add anything to the queue.

(Admittedly, someone could still have a task with a different master that could populate the queue while the dequeuing tasks meet the termination criteria (and therefore terminate), that's more of a special case that developers can choose to avoid or not.)

Regarding implementation challenges:

Richard said that "a selective_accept with a terminate alternative is really manifested as an extra implicit accept alternative that is only "callable" by the runtime."

While we don't otherwise combine an accept_alternative (where the task is the callee, being called) with an entry_call_alternative (where the task is the caller, calling another task or a protected entry), we do already have timed entry calls. Delay alternatives are supported for both callee (selective_accept) and for caller (timed_entry_calls).

In a timed entry call, the task is waiting on a blocked entry call that it can cancel due to an event, in that case a timeout.

In this case, while waiting for the entry call to unblock, three events could trigger re-evaluating the criteria:

So if those events happen and result in the criteria is met, the entry call would be cancelled, and the task terminated, along with similarly blocked tasks with the same master.

Since the terminate conditions are no different from what is already in the RM, we probably don't need to add a lot of language to the LRM to support this, despite the length of my reply.

sttaft commented 3 months ago

It seems the main goal here is to provide automatic termination for a set of tasks that do not use rendezvous, but rather use shared protected objects, as their means of coordination. One big difference is that protected objects do not have masters, so they could still be visible to tasks at a higher level than the one doing the entry-call-with-terminate (unless the master is itself the environment task). I suppose the intent here is "programmer beware" -- don't do that.

Because protected objects do need finalization, in general, we could associate them with a level in the task hierarchy, but that would probably add too much complexity, and it is not clear this would actually solve the problem. So I think this mechanism will be useful only when all of the relevant tasks have the same master. At compile-time, we could try to require that the protected object has a lifetime that matches that of the master of the task doing the entry call, but that may just be too hard ...

Richard-Wai commented 3 months ago

@joshua-c-fletcher

It is very different from ATC. A terminating_entry_call would never abandon the execution of an entry that it is already executing, or the sequence of statements that can be part of the entry_call_alternative.

I think you're viewing this from the programmer's perspective and I'm viewing it from the implementer's perspective. The big difference between a selective_accept and all the other select statement types is that selective_accept is about the enclosing task accepting other tasks queued on one of it's own entries. Terminate works here because it can easily kick off all the tasks queued on its entry and then terminate. For all the other select statements, you have the enclosing task itself doing an entry call.

Your proposed "terminating_entry_call" would also be doing entry calls that need to be cancelled somehow. And the process of cancellation is not guaranteed. Therefore your wording might suggest a race-condition:

In my expectation, the terminate_alternative would only be selected if the task was blocked at the select statement, waiting for entry where the conditions for entry were not yet met.

The problem here is that by the time you determine if that is true, you might be selected for the entry. I think this is exactly why "cancellation" of an entry call is a thing in the first place

Ultimately I think we are converging on roughly the same thing, I'm just trying to rationalize it from the implementer's side. But I think we agree on the behavior..

@sttaft

One big difference is that protected objects do not have masters, so they could still be visible to tasks at a higher level than the one doing the entry-call-with-terminate

I don't really understand how this is relevant. For example if we have two separate tasks with separate masters that are pulling from a synchronized queue, it should be fine that one of those tasks terminates before the other using this proposed mechanism.

All we really want is for a given task to be given the chance to cancel its entry call (or decide not to attempt it at all), if its master is completed.

sttaft commented 3 months ago

@sttaft

One big difference is that protected objects do not have masters, so they could still be visible to tasks at a higher level than the one doing the entry-call-with-terminate

I don't really understand how this is relevant. For example if we have two separate tasks with separate masters that are pulling from a synchronized queue, it should be fine that one of those tasks terminates before the other using this proposed mechanism.

I am thinking of the scenario where a task at a higher level is adding to the queue, and relying on a task at a lower level to read from the queue. The task at the lower level would give up any time when there is nothing in the queue and no other tasks at its level, even though the protected object representing the queue, and the task filling the queue, are at a higher level and still very much alive. Clearly there are problems with such a design, and I was hoping there might be some way to detect the mistake at compile-time.

All we really want is for a given task to be given the chance to cancel its entry call (or decide not to attempt it at all), if its master is completed.

I was fearing the problem of giving up too soon.

joshua-c-fletcher commented 3 months ago

I had thought of that scenario (a task at a higher-level modifying a protected object, while a task at a lower level meets the termination conditions.)

I took that to be a consequence of the (existing) definition of a terminate_alternative. I felt the rule still made sense, and was acceptable behaviour that could be explained by the definition of the rule.

finding "some way to detect the mistake at compile-time." suggests keeping the existing rule, but either warning or forbidding the user against using the construct in a case where the protected object is (or could be) modified at a higher level than where the terminating_entry_call is made from.

I don't think it would be possible to detect that at compile time if the terminating_entry_call is expressed in a subprogram (that could be called from more than one task at different levels). We could limit the use of the construct to the immediate body of a task, and that would still be plenty useful, but I don't think it should be necessary to restrict that.

Another option would be to have a different rule for the terminate_alternative when used in a terminating_entry call. This version could keep from terminating unless all tasks at the same-or-deeper accessibility level as the protected object being called are either complete or blocked at a select statement with a terminate_alternative. There are probably problems with that rule, though.

Or we could use a simpler rule such that the terminate_alternative for a terminating_entry_call that isn't concerned with masters, and only allows the terminate alternative in a terminating entry call to be selected when all tasks are either complete or blocked at a select statement with a terminate_alternative (of either kind).

I'm OK with the existing rules for the terminate_alternative though, and re-using them for this new context. The most typical case (where the terminating_entry_call would be used) probably involves protected objects and tasks declared at the library level. For that case, all of these rule variants above would be equivalent.

For more interesting cases, it might "give up too soon" for some scenarios, but at least the rule would be clear, and the developer could choose to use the construct or not, based on whether it suits the needs of their program.

jprosen commented 3 months ago

I had thought of that scenario (a task at a higher-level modifying a protected object, while a task at a lower level meets the termination conditions.)

I took that to be a consequence of the (existing) definition of a terminate_alternative.

Not at all, and I don't see how this proposal could work. A master of a task is the outmost scope where the task can be accessed (thanks to the removal in Ada95 of the so-called Ada83's Rosen's pathology ;-) ). Therefore, if all tasks depending on a master are blocked on a accept statement, there is no more task left that could call any entry - the system is provably blocked. If a task is calling an entry, it is not blocked. The called entry can belong to anything outside the set of masters that the calling task is depending on, including protected entries attached to interrupts. How can you prove that an interrupt will never happen? The only case that is decidable is when the master is the environment task, i.e. all tasks in the system are blocked, and there are no entries attached to interrupts. Final remark: protected types have benefits, they have drawbacks too. One of the drawbacks is that, being passive data structures, you lose the ability of automatic termination. If you need that, don't use protected types or devise your own termination algorithm. This proposal would almost certainly create cases of unwanted terminations.

Richard-Wai commented 3 months ago

Not at all, and I don't see how this proposal could work.

I'm really struggling with how masters are relevant at all to the proposal if we use the existing semantics of cancellation. If the terminate alternative is selected only when an associated entry call alternative is successfully cancelled, or upon entry to the select statement when the terminate alternative is eligible, then it doesn't matter what master the target of the entry call is part of, nor does it matter if the target of the entry call is a task or protected type. It's all about cancellation. Termination only happens if cancellation is successful, otherwise termination does not happen and the task continues (which is exactly what would happen without any normal task termination anyways).

joshua-c-fletcher commented 3 months ago

@jprosen wrote: "If a task is calling an entry, it is not blocked"

But according to the LRM, in 9.5.3

"If the named entry is closed, the entry call is added to an entry queue (as part of the protected action, for a call on a protected entry), and the call remains queued until it is selected or cancelled;" ... "For an entry call that is added to a queue, and that is not the triggering_statement of an asynchronous_select (see 9.7.4), the calling task is blocked until the call is cancelled, or the call is selected and a corresponding select_statement or entry_body completes without requeueing"

Therefore, a task calling an entry is described by the LRM as blocked while the entry is closed, which is the situation when selecting the terminate alternative would be possible, if the other conditions for the terminate_alternative are also met (such as other tasks with the same completed master being terminated or similarly blocked, waiting on a closed entry with a terminate_alternative.)

With respect to early termination, if the rule is clear under what circumstances termination would occur, then a developer can choose to use the feature when they want that behaviour (such as in the use cases described), and not use it when they don't want that behaviour.

If the task is blocked waiting on an entry to an interrupt, and we keep the originally proposed behaviour, then the task would give up waiting based on its own master and the set of tasks that depend on the same master. That doesn't mean an interrupt like what it was waiting for might not still happen, that it would have responded to if that master hadn't finalized, but it implicitly means that the programmer has decided (by using this construct in this context) that it's OK for that task to terminate if those termination conditions are met; the task that would have waited for the interrupt doesn't need to wait anymore.

If there is a task hierarchy, with nested tasks at different levels with different masters, there should be some design intent behind that, so deciding that a task is allowed to finalize under these circumstances makes sense to me. It isn't necessarily terminating "too soon" if the behaviour is by design. The portion of the program defined by that master and its descendants completes; if something like a queue is visible to other parts of the program and can still be populated by other parts of the program, then other parts of the program can work also with that, accordingly.

If someone doesn't want it to use the terminate_alternative because the intent of their design doesn't fit the rule, they don't have to use it.

jprosen commented 3 months ago

@jprosen wrote: "If a task is calling an entry, it is not blocked"

But according to the LRM, in 9.5.3...

Sorry, I meant "eternally blocked", not blocked in RM sense

If someone doesn't want it to use the terminate_alternative because the intent of their design doesn't fit the rule, they don't have to use it.

Sure, but I fail to see a use case where this feature would be useful. IIUYC, the example is an event queue and you want to stop the tasks when no more events can be enqueued. But the proposal would be useful only in a design pattern where the producer(s) and the consumer(s) all depend on a common master (other than the environment task). A very narrow use case in my opinion, with the risk of people misunderstanding the feature.

Richard-Wai commented 3 months ago

But the proposal would be useful only in a design pattern where the producer(s) and the consumer(s) all depend on a common master (other than the environment task). A very narrow use case in my opinion, with the risk of people misunderstanding the feature.

How so? A common use case for this in fact dynamically spinning up a queue plus some producer/consumer tasks where you want all of them to terminate at the completion of the master, but it can also easily be a pattern where all involved tasks and queues are library-level objects. In both cases the only way to terminate these tasks gracefully is to create some kind of special termination message, which often requires creating extra data types and implementing other boiler-plate. It is a common-enough pattern that a language-defined feature would be valuable, despite an apparently narrow use.

I fail to see a case where the user could easily (non-pathologically) do something "bad" with this feature besides leaving a queue unserved and non-empty, which seems to be such a high-level issue (not even necessarily an error), that it finds itself in company with the myriad other bad things anyone can do in Ada if so determined.

sttaft commented 3 months ago

We might want to limit this (at least initially) to protected entry calls. For task entry calls, the notion of being blocked on such an entry is less well defined, and is subject to more race conditions since the entry might not be open simply because the accepting task just hasn't cycled back to the accept statement.

Richard-Wai commented 3 months ago

For task entry calls, the notion of being blocked on such an entry is less well defined, and is subject to more race conditions since the entry might not be open simply because the accepting task just hasn't cycled back to the accept statement.

But the existing concept of cancellation has no problem handling this case, I'm not sure "blocking" matters in the context of cancellation. If the task hasn't reached the matching accept, the call is canceled. If it does, the call is not cancelled and it proceeds normally.

sttaft commented 3 months ago

For task entry calls, the notion of being blocked on such an entry is less well defined, and is subject to more race conditions since the entry might not be open simply because the accepting task just hasn't cycled back to the accept statement.

But the existing concept of cancellation has no problem handling this case, I'm not sure "blocking" matters in the context of cancellation. If the task hasn't reached the matching accept, the call is canceled. If it does, the call is not cancelled and it proceeds normally.

Presumably it would not be cancelled if the task being called is in the same master and is not terminable, so I guess we are basically in the same situation as with the protected entry call. If on the other hand the task being called is at a higher level, then it is similar to having the protected entry being updated from a task at a higher level, so I withdraw my objection to supporting task entry calls ...

Richard-Wai commented 3 months ago

Presumably it would not be cancelled if the task being called is in the same master and is not terminable, so I guess we are basically in the same situation as with the protected entry call.

See my thinking here is that we would modify 6.3-8, to be something like:

The key being that it's only if cancellation succeeds, or if we just enter into a terminating_entry_call, that termination happens. In any other case things proceed as they normally would have - that task doesn't terminate.

If we take this approach we shouldn't need to be concerned with blocking or masters, and we get to use the maximum amount of existing dynamic semantics.

The only down-side seems to be the potential for lost queue items at some point, but this should be obvious to the programmer, and probably an accepted possibility in most designs.

Alternatively we might have some way of defining another behavior if cancellation fails, such as raising a Program_Error..

jprosen commented 3 months ago

(sorry if you receive this message twice - I replied to the list in the hope that it would be added automatically to GitHub, but no. So I add it to GitHub too).

A common use case for this in fact dynamically spinning up a queue plus some producer/consumer tasks where you want all of them to terminate at the completion of the master, but it can also easily be a pattern where all involved tasks and queues are library-level objects. In both cases the only way to terminate these tasks gracefully is to create some kind of special termination message, which often requires creating extra data types and implementing other boiler-plate.

It's not the only way! You could have an End_Of_Data function in your PO, or you can raise End_Error when no more data is available - exactly what you have for IO's.

jprosen commented 3 months ago

We might want to limit this (at least initially) to protected entry calls. For task entry calls, the notion of being blocked on such an entry is less well defined, and is subject to more race conditions since the entry might not be open simply because the accepting task just hasn't cycled back to the accept statement.

The same could be said for PO's: the entry might not be open simply because some task/event didn't reach the call to the PO that would open the entry.

Richard-Wai commented 3 months ago

It's not the only way! You could have an End_Of_Data function in your PO, or you can raise End_Error

I think we're really talking about a producer -> consumer model with a single queue in-between. But even if we didn't use the standard library queues, and rolled-our own PO, we still have a problem where we might not know that we are at the End_Of_Data state before one or more of the consumer tasks queues elsewhere. And there-in-lies the rub; we need an active signaling system to tell the consumer tasks to stop. Ultimately I think this proposal is valuable for eliminating that boiler plate, and making the Synchronized_Queue_Interfaces package more generally useful.