Bounded Error in Rationale for Termination Handler indicates more is needed to achieve the goal

joshua-c-fletcher commented 10 months ago

The package Ada.Task_Termination was introduced in Ada 2005, according to the Rationale, to address "the problem of how tasks can have a silent death in Ada 95."

The example use case in the Ada 2005 Rationale illustrates how a Termination_Handler can be used to log about the cause of termination for a task: https://www.adaic.org/resources/add_content/standards/05rat/html/Rat-5-2.html

Notably, however, File IO consists of potentially blocking operations. It was explicit in Ada 2012 (and earlier): LRM 9.5.1 18: "the subprograms of the language-defined input-output packages that manipulate files (implicitly or explicitly) are potentially blocking." In Ada 2022, the wording has changed to rely on the Nonblocking aspect, but File IO is still recognized by the language as a potentially blocking operation, and using it in a protected subprogram is a Bounded Error: a condition eligible for raising a Program_Error, if detected.

The example Put_Log procedures in the Rationale are not described, so it is possible they are nonblocking calls, but I'm not sure of a realistic scenario that achieves the goal logging without making a blocking call at some point:

For example Put_Log could add the strings to a Queue, and a separate task could pull the strings off of the queue to "log" them (by writing to a file, or sending them of some other logging service, etc... all things that likely have to involve a potentially blocking call) Such an approach could do the logging in another task, as long as it isn't that (logging) task that is terminating, and as long as it isn't the Environment task that is terminating.

But what about setting a termination handler for the main Environment task?

If Put_Log puts the text onto a queue for another task, we won't actually see the termination of the main task until all dependent tasks are completed, and our logging task will have to complete before the main task will trigger the handler.
If Put_Log just writes text to a file... it may well work, but it's a bounded error, and could rightly raise a Program_Error if detected.

The Rationale even explicitly contains an example setting a termination handler on the Environment task. Granted, it doesn't provide an example of what RIP.Two might do... Set_Specific_Handler(Current_Task, RIP.Two'Access);

It seems to me that if Task_Termination is meant to allow for logging in these scenarios, it fails - especially if you want to use it to log about the termination of the main task.

I'm not sure what solution to recommend:

perhaps some language-defined provision that allows potentially blocking calls in termination handlers - at least for termination handlers that are set on the environment task?
perhaps a non-protected version of a termination handler procedure type is needed.
perhaps a language-defined mechanism for nonblocking logging calls that would work in the context of the existing protected termination handler type

Looking at the Annotated Reference Manual, there has already been some discussion about bounded errors in a termination handler for the Environment task simply by it calling a protected procedure of a protected object that must, by definition, have already been finalized before being called... so maybe even allowing any use on the Environment_Task is asking for trouble.

If there is no other code solution, though, it should probably be more explicit in the LRM that File IO operations are a bounded error inside a Termination_Handler. As it is, the example in the rationale encourages writing code with this kind of error.

sttaft commented 9 months ago

Interesting issue. My sense is that logging is always a challenge in a real-time embedded system, but I think a logging subsystem for a real-time system is generally designed to be nonblocking, even if that means some entries might be lost or overwritten. In any case, it seems unlikely that normal File IO would be used for logging, simply to avoid the overhead and potential for race conditions.

For a non-real-time system, doing Text_IO from a protected operation is pretty common in my experience, even though it is "officially" a bounded error. So I am not sure it would help to say too much here, since the best advice would necessarily be quite application dependent.

ARG-Editor commented 7 months ago

The Rationale for previous language versions is an unofficial document in the sense that it could give bad advice or examples; not every possibility was thought of when it was written, and it is not being corrected for such problems. So if it says something stupid, it's best to ignore it.

My experience with soft real-time systems is similar to Tucker's: logging is done to memory, and a logging process transfers that memory to some log file asynchronously to the actual logging. (That also let me reduce the size of the log files by eliminating duplicate transactions, important when spammers repeatedly try a door locked to them.) I don't recall if it has any blocking, but certainly there is none unless the memory buffer is full (which shouldn't happen).

I'm not sure that there is anything that should be done to the RM here. The rules for blocking in protected objects are well known, and this issue shows up in some way in almost all of the facilities with call-backs in Annex C and D (for instance, the Execution Time Timers of D.14.1). Putting a note at one just makes it seem like one case is more important than the others, and putting notes at all of them seems like it's adding redundancy.

So I'm unsure how to proceed with this issue.

Ada-Rapporteur-Group / User-Community-Input

Bounded Error in Rationale for Termination Handler indicates more is needed to achieve the goal #68