Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Bounded Error in Rationale for Termination Handler indicates more is needed to achieve the goal #68

Open joshua-c-fletcher opened 10 months ago

joshua-c-fletcher commented 10 months ago

The package Ada.Task_Termination was introduced in Ada 2005, according to the Rationale, to address "the problem of how tasks can have a silent death in Ada 95."

The example use case in the Ada 2005 Rationale illustrates how a Termination_Handler can be used to log about the cause of termination for a task: https://www.adaic.org/resources/add_content/standards/05rat/html/Rat-5-2.html

Notably, however, File IO consists of potentially blocking operations. It was explicit in Ada 2012 (and earlier): LRM 9.5.1 18: "the subprograms of the language-defined input-output packages that manipulate files (implicitly or explicitly) are potentially blocking." In Ada 2022, the wording has changed to rely on the Nonblocking aspect, but File IO is still recognized by the language as a potentially blocking operation, and using it in a protected subprogram is a Bounded Error: a condition eligible for raising a Program_Error, if detected.

The example Put_Log procedures in the Rationale are not described, so it is possible they are nonblocking calls, but I'm not sure of a realistic scenario that achieves the goal logging without making a blocking call at some point:

But what about setting a termination handler for the main Environment task?

The Rationale even explicitly contains an example setting a termination handler on the Environment task. Granted, it doesn't provide an example of what RIP.Two might do... Set_Specific_Handler(Current_Task, RIP.Two'Access);

It seems to me that if Task_Termination is meant to allow for logging in these scenarios, it fails - especially if you want to use it to log about the termination of the main task.

I'm not sure what solution to recommend:

Looking at the Annotated Reference Manual, there has already been some discussion about bounded errors in a termination handler for the Environment task simply by it calling a protected procedure of a protected object that must, by definition, have already been finalized before being called... so maybe even allowing any use on the Environment_Task is asking for trouble.

If there is no other code solution, though, it should probably be more explicit in the LRM that File IO operations are a bounded error inside a Termination_Handler. As it is, the example in the rationale encourages writing code with this kind of error.

sttaft commented 9 months ago

Interesting issue. My sense is that logging is always a challenge in a real-time embedded system, but I think a logging subsystem for a real-time system is generally designed to be nonblocking, even if that means some entries might be lost or overwritten. In any case, it seems unlikely that normal File IO would be used for logging, simply to avoid the overhead and potential for race conditions.

For a non-real-time system, doing Text_IO from a protected operation is pretty common in my experience, even though it is "officially" a bounded error. So I am not sure it would help to say too much here, since the best advice would necessarily be quite application dependent.

ARG-Editor commented 7 months ago

The Rationale for previous language versions is an unofficial document in the sense that it could give bad advice or examples; not every possibility was thought of when it was written, and it is not being corrected for such problems. So if it says something stupid, it's best to ignore it.

My experience with soft real-time systems is similar to Tucker's: logging is done to memory, and a logging process transfers that memory to some log file asynchronously to the actual logging. (That also let me reduce the size of the log files by eliminating duplicate transactions, important when spammers repeatedly try a door locked to them.) I don't recall if it has any blocking, but certainly there is none unless the memory buffer is full (which shouldn't happen).

I'm not sure that there is anything that should be done to the RM here. The rules for blocking in protected objects are well known, and this issue shows up in some way in almost all of the facilities with call-backs in Annex C and D (for instance, the Execution Time Timers of D.14.1). Putting a note at one just makes it seem like one case is more important than the others, and putting notes at all of them seems like it's adding redundancy.

So I'm unsure how to proceed with this issue.