Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
28 stars 1 forks source link

Conditional code is not supported well #102

Open ARG-Editor opened 3 months ago

ARG-Editor commented 3 months ago

[This issue has been opened to continue work on an unfinished topic from Ada 2022. This was done in response to the ARG Resolution of November 2022. This is the last topic that will be "promoted" this way.]

It is very common for program units to come in multiple versions. This often manifests as small chunks of code that belong only to one version or another. These chunks can be handled with runtime selection, but in some enviroments (particularly some safety-critical environments), any code that can be executed has to be verified in various ways. Code that is unused in the delivery version still costs time and money to handle. Mechanisms need to exist to ensure that such code does not exist in the final versions.

For instance, "production" and "testing" versions of code are quite common. "Testing" version often have extra tracing and assertions that are unnecessary (or even harmful) in the final program.

Another example is when extra code is needed to support verification. It commonly is the case that additional data and functions are needed to support program verification. It is best if that code can be omitted from final versions of programs (and sometimes even testing versions).

We call these small chunks of code that only belong to some versions "conditional code".

Current mechanisms and problems

The Ada programming language does not provide any special mechanisms to handle conditional code (we'll discuss why later). One commonly used approach solely within the Ada language is to control such code with static Boolean values.

This usually works even for safety-critical applications as most compilers can eliminate code that clearly can never be executed. However, this technique is of limited applicability.

As such, Ada currently only covers part of the problem.

Ada vendors are of course aware of these problems and provide various techniques to help these issues. One way that can be done is with project management schemes (such as project files and processors). A project management scheme provides a way to select different versions of units and of compiler options for different versions of a program. These schemes can range from fairly simple to quite elaborate.

But they all suffer from not being part of the language. One obvious way is that they only work for a single vendor's tools. If one switches vendors, all of the project management artifacts have to be rebuilt.

Another problem is that a change in a package specification (which is certain to happen in long-lived developments) means that all of the other versions of that package have to be changed similarly. But those other versions might belong to other teams (as happens for different targets of a compiler), and there is no semi-automated way to ensure that all of the versions are changed appropriately.

Even when the other versions are just of the bodies of the package (with a single shared specification), additions to the specification can be problematical. Other teams owning bodies will suddenly find (usually at the worst possible time) that they cannot compile their project, and may have difficultly understanding what the new additions need to do.

Finally, versioning at the package level usually means that a lot of code needs to be duplicated between all of these versions. Keeping that code (which should not be different) in sync adds another headache to managing versions in a project management scheme.

Another way that vendors can help this problem is through bind-time optimizations. At the time of binding an Ada program, all of the code and data usage for that program can be determined, thus any unused code or data can be removed. This technique requires no code changes at all to eliminate unused entities.

In practice, there are three main ways that that can be accomplished. First, the (previously) compiled binary code can be augmented with enough information so that unused code and data can be determined, and then eliminated. Second, conversion to binary machine code can be delayed until bind-time, when much more information about the full program is available. Third, one could determine the code used by the final program and create modified source code that doesn't include the unused entities.

However, these sorts of techniques are not commonly implemented today. Janus/Ada has included a version of the first method since the mid-1980s. GNAT has some tools that appear to use the third method. Some compilers for other languages use the second method.

Additionally, this technique can only work on Ada code, so there still is no help here for pragma changes when the mode changes. And of course source editing (or going outside of the language) is still necessary for changing modes.

Another approach commonly used for handling this problem works mostly or completely at the lexical level. This usually is some combination of include files and macro processing. The C system is the best known example.

These sorts of systems typically allow writing almost any entity (and often parts of entities) in a conditional manner. The compiler is presented a particular version based on settings (often compiler options).

These techniques were well known when Ada was designed, and were available as part of many popular program language implementations (if not the language itself). So why did the Ada designers avoid them??

Mostly likely, they were trying to avoid the problems that come from lexical schemes used in large system with many options. In such situations, it is common that the code for some combinations of options is not even legal (since the code is constructed at the lexical level, it is easy to create bad syntax). When there are many options, it is impractical to test all of the combinations, so many go untested and thus may not even be possible to compile. Since Ada keeps all conditional compilation within the language, it avoids this problem: all code is compiled, and therefore is checked for legality, even if it can never be executed with the current set of options.

What we'd really want is a mechanism that is: (1) Part of the language, so it is widely available; (2) Allows any sort of entity to be conditionally compiled; (3) Allows any pragma to be included (or not) in a given version; (4) Allows the specification of as many separate options as is needed; (5) Ensures that no code or data is needed for entities not included in the current version.

We haven't talked about point (4) yet, so let's take a look at that.

Typically, any significantly sized system will need multiple levels of testing, multiple levels of specification, and often multiple items of configuration. A single setting is not enough.

For instance, in contracts, it is common for there to be three kinds of routines. There are some routines which are impractical to evaluate at runtime (these often describe properties of the target that are not easily determined). There are more routines that can be evaluated, but are too expensive to evaluate all the time. An example is Is_Sorted for the Vector containers. The result of Merge is only well-defined for inputs that are sorted, but Is_Sorted is expensive and calling it each time would substantially slow testing. One only wants to do that if a problem in that area is suspected. On the other hand, many queries are cheap. In the vector container, Is_Empty is cheap (a single integer test); it should be run during all testing (even if it is to be eliminated in the fielded system).

Similarly, tracing code is often split into groups on large programs. Running all tracing can be impractical. For instance, running all tracing in Janus/Ada generates a larger than 3MB file even on the smallest ACATS tests. Even if one can load the file, finding the item of interest in it is very difficult. It's much better to only run a few traces of interest.

More generally, language features that can only be used one way have been avoided in the Ada design. Ada tries to take a more building block approach, so that the features it has can be combined to solve many problems, not just a single one. Some of the places where that was not done (such as stream attributes), have been a pain point. We want to avoid introducing more such points.

We'll take a look at the existing proposal in my next message.

ARG-Editor commented 3 months ago

The existing proposal is found in AI12-0239-1, known as "Ghost code". The basic idea is that entities can be tagged as conditional code with an aspect; the aspect takes a name that can be chosen by the user. Various rules are enforced so that non-conditional code cannot depend on conditional code except in specific, specified ways. For instance, use of conditional code in assert expressions (like Pre and Post) is allowed. Conditional code can read, but not write, non-conditional data. A pragma Ghost_Policy can be used to turn on or off specific code groupings.

This proposal was extensively discussed (in part over enormous hamburgers!) during the Warsaw ARG meeting in 2019. Some, but not all, of the results of that discussion are reflected in the AI. Additionally, a lot of detail still needs to be worked out.

Since that meeting, concern has been raised that the term "ghost" is rather specific to the usage for assertions (even though the mechanism is useful for more than that). A different term should be considered (I've tried to avoid using "ghost" in this discussion other than for discussing the existing proposal, although for the expected usages and rules it seems OK to me).

The big advantage of this proposal is that code that compiles once with any particular Ghost_Policy setting will continue to be legal in other Ghost_Policy settings (with the remaining limitation on value-dependent code, see the discussion of static conditional code for more on that limitation).

The proposal appears to meet all of the criteria set out in the introductory part, but of course many details still need to be worked out.

Ideally, some interested parties would come together (in a LSG) to work out and prototype a more complete proposal.

dhombios commented 3 months ago

It would be nice to improve platform dependent code management. Many commonly used APIs (like GUI) differ between operating systems, so usually a Hardware (or System) Abstraction Layer is defined to handle that.

In C that's managed using the preprocessor, but Ada depends on external tools to achieve that automatically (some Ada compilers provide c-like preprocessors). However, I understand that this approach was avoided when Ada was designed as it obscures code.

Maybe this could be solved with a special type of package with a single specification and multiple bodies (one for each platform). During compilation, the compiler could choose the adequate body for the target.

A system specification file could be used for defining the body categories used for each target (for example {Linux, x64} or {Embedded, RiscV}:

system MyComplexSystem is
    component server is
        os := "Linux"
        architecture := "x64"
        entrypoint := server_main -- first procedure to be executed 
    end server;

    component client is
        os := "Windows"
        architecture := "x64"
        entrypoint := client
    end client;

    component iot_device is
        os := "Embedded"
        architecture := "arm_x64"
        entrypoint := iot_startup_service -- functions called by interruptions could be also defined here
        memorymap    -- description of the program memory, interruption table, data memory, registers...
        assembler_description -- compiler backend description file in case that architecture was not supported by the compiler. It could be designed to be able to generate it from VHDL automatically, so it becomes easier to use Ada in new architectures as well as in soft-cores
    end iot_device;

    -- System level variables (f.e. the baud rate used by two components that communicate using an UART). This would ensure that a change in this variables affects all components that use them, avoiding problems like setting different baud rates in each component
end MyComplexSystem;

Compiling the system would generate executables for every component, treating all of them as a single program for verification purposes

In case that a platform independent implementation is also provided, a 'Virtual' category could be used:

abstract package SAL is 
    -- Common interface for all platforms
end SAL;

abstract package body SAL for {System, Hardware architecture} is
    -- implementation for that particular system (f.e Windows) and hardware architecture (x64, x32 or virtual if it works in any architecture)

end SAL;

As a result, the program would be structured in three levels:

This approach could also allow providing implementations of standard packages for platforms that aren't supported by the compiler

sttaft commented 3 months ago

What you are suggesting of having one spec and multiple bodies is supported well already by most Ada compiler systems. The exact mechanism depends on the implementation, but it is definitely the "Ada way" and there are various ways to accomplish it with most existing Ada systems.

ARG-Editor commented 3 months ago

Tucker wrote:

What you are suggesting of having one spec and multiple bodies is supported well already by most Ada compiler systems. The exact mechanism depends on the implementation, but it is definitely the "Ada way" and there are various ways to accomplish it with most existing Ada systems.

Surely. I covered that in the problem overview part, as "Project Management Techniques". The problem, of course is that these techniques are vendor-specific -- moving portable Ada code from one vendor to another (for whatever reason) requires completely rebuilding this sort of management.

Another problem is that these sort of techniques require a lot of code duplication: even if there is only a single routine that needs to be different for a target, the entire package needs to be duplicated. One can work around that to some extent by having more packages, but at some point you overwhelm the mind-space.

Given how common both these techniques and the underlying problem is, it would make sense to include some mechanism as part of the language for this sort of management. Having such a specification would certainly make it easier to try other lesser-known implementations to see if they better meet your needs.

But that wasn't really a solution to the sorts of problems posed here, which mainly is the case of debugging/proof vs production modes (particularly in the case where the debugging/proof code must not be included in the production system, as in some kinds of safety-critical systems).

So I would say that some sort of body-selection mechanism as part of the language is a separate problem and probably should be considered as a separate issue. (I certainly don't see any way to accomplish it with aspects, which is the obvious way to deal with fine-grained changes.) It would be common to use both solutions in dealing with the separate target problem (Janus/Ada uses a combination of runtime settings, compile-time values [which the aspect mechanism described in this issue would be good for], and separate implementations (usually bodies, but sometimes complete passes, as in code generators) [which a body-selection mechanism, as was suggested in this comment, would be useful for].

           Randy.
joshua-c-fletcher commented 3 months ago

Randy wrote:

even if there is only a single routine that needs to be different for a target, the entire package needs to be duplicated.

If there's only a single routine that needs to be different, one could use a separate for the individual routine,

If there's a portion of functionality, one could make a private child package to provide the implementation of that portion. The private child would be invoked from the body of common code to provide the different behaviour for the different cases.

All Ada vendors presumably provide some way of specifying while source files are included in a project, so you'd include the desired version of the child package or the desired body in the project for the program you want to build.

Alternatively, one could have a nested package with the body as a separate, and take a similar approach, so that the desired separate body is selected.

Finally, one could define an interface type providing an interface for the conditional behaviour, with a mechanism in the common code to regiater an instance of this interface. The common code would check if such an instance has been registered and invoke it where needed. (or if multiple instances can be registered, it could invoke them)

For programs that need special behaviour, they'd compile in a package that implements the interface and registers it. (perhaps at elaboration), Then, when the common code needs to invoke the variant behaviour, it calls the interface and the program-specific code will be executed.

Different Ada vendors have different ways to set up project files, but as long as they all allow you to select which source files are included in a project, they'd all support any of these options. Whether you need separate project files for supporting variants including different files or whether a single project file allows conditions to include different source, conditionally, would be vendor specific, but this seems to be more a source file selection compilation matter than a language matter.

Joshua

On Thu, Jul 18, 2024 at 1:58 AM ARG-Editor @.***> wrote:

Tucker wrote:

What you are suggesting of having one spec and multiple bodies is supported well already by most Ada compiler systems. The exact mechanism depends on the implementation, but it is definitely the "Ada way" and there are various ways to accomplish it with most existing Ada systems.

Surely. I covered that in the problem overview part, as "Project Management Techniques". The problem, of course is that these techniques are vendor-specific -- moving portable Ada code from one vendor to another (for whatever reason) requires completely rebuilding this sort of management.

Another problem is that these sort of techniques require a lot of code duplication: even if there is only a single routine that needs to be different for a target, the entire package needs to be duplicated. One can work around that to some extent by having more packages, but at some point you overwhelm the mind-space.

Given how common both these techniques and the underlying problem is, it would make sense to include some mechanism as part of the language for this sort of management. Having such a specification would certainly make it easier to try other lesser-known implementations to see if they better meet your needs.

But that wasn't really a solution to the sorts of problems posed here, which mainly is the case of debugging/proof vs production modes (particularly in the case where the debugging/proof code must not be included in the production system, as in some kinds of safety-critical systems).

So I would say that some sort of body-selection mechanism as part of the language is a separate problem and probably should be considered as a separate issue. (I certainly don't see any way to accomplish it with aspects, which is the obvious way to deal with fine-grained changes.) It would be common to use both solutions in dealing with the separate target problem (Janus/Ada uses a combination of runtime settings, compile-time values [which the aspect mechanism described in this issue would be good for], and separate implementations (usually bodies, but sometimes complete passes, as in code generators) [which a body-selection mechanism, as was suggested in this comment, would be useful for].

Randy.

— Reply to this email directly, view it on GitHub https://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/102#issuecomment-2235333612, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4A3SCX7S77C5KSD7YBAI2DZM5DQVAVCNFSM6AAAAABK54M2KWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGMZTGNRRGI . You are receiving this because you are subscribed to this thread.Message ID: <Ada-Rapporteur-Group/User-Community-Input/issues/102/2235333612@ github.com>

dhombios commented 3 months ago

What you are suggesting of having one spec and multiple bodies is supported well already by most Ada compiler systems. The exact mechanism depends on the implementation, but it is definitely the "Ada way" and there are various ways to accomplish it with most existing Ada systems.

I like that approach, as it results in cleaner code. It is true that it forces some code duplications, but for highly optimized assembly code that isn't usually the case as it isn't usually cross platform. Nevertheless, I think both approaches, ghost code and a single package with multiple bodies, can complement each other

Standardizing it could be interesting for providing a way to verify annex E systems. Treating software that runs on each device as a single program allows checking their interdependence, ensuring the coherence between the implementations running in each device:

+----------------------------------------------------------+ | Platform independent code | +----+-----------------+----------+-------------------++ | System | | System | | Drivers |<-Var->| Drivers | +-----------------+ +-------------------+ | HAL |<-Var->| HAL | +-----------------+ +-------------------+ (<-Var-> represents variables that don't affect high level logic of the program but that need to have the same value in different devices to ensure that they can communicate)

Richard-Wai commented 3 months ago

I actually tried my hand at this problem some time ago to some practical effect. My goal was to imagine a new Specialized Need Annex for the RM that defined behaviors for configuration management, build management, and even a primitive form of package management. This annex would define a set of behaviors that could optionally be implemented by an Ada compiler. I named this notional annex the “Ada User Repository Annex” (AURA).

As part developing this idea, I created a working reference implementation that can drive any Ada compiler, but is currently targeted at GNAT. I dog-food this tool on a daily basis and have gained some interest from the community. Honestly it has become critical to my workflow generally, and so at this point I’d consider the idea well-baked and practical. The AURA CLI tool replaced GPRbuild for all of my uses, and also acts as an (albeit less sophisticated) competitor to ALIRE.

I have pretty complete documentation on the concepts of AURA and the reference implementation here.

The basic idea is elaborate on the existing Ada concept of the “subsystem” (top-level packages and all their children) with optional AURA “manifests” that can be provided for any subsystem in a program. These manifests are completely regular Ada packages with no special syntax. A key design goal was to avoid the “Ada-like syntax” mess of GPR entirely. The only thing that AURA defines is what they signify and how they are processed, which is a process termed in AURA as “autoconfiguration”.

Without getting into the weeds on how autoconfiguration happens, each manifest is simply a first child package of a subsystem with the specific name “AURA”. AURA specifies what is allowed in such a package and what items have special significance, which is fully documented on the Read The Docs page linked previously. These manifests become activated as configuration packages through the (compile-time) “autoconfiguration” process, which ultimately derives them into a project-specific “configuration package”.

Essentially, a manifest/configuration package contains a set of specific but optional nested packages which define a set of constants. Of interest to this discussion is the specified “Codepaths” package, described specifically here.

The Codepaths package defines some number of subdirectories which the compiler adds to its “include” path when compiling that subsystem. These directories are generally defined via Ada’s conditional expression facilities. This allows compile-time configuration of which Ada sources are actually submitted to the compiler during compilation. The idea is that you can put specific bodies and subunit bodies in the appropriate subdirectories, only to be compiled if selected in the Codepaths package.

Additionally, the subsystem itself can see the configuration package by simply withing it, allowing for deeper compile-time self-configuration of the subsystem. There also exists a root AURA package which itself contains some pre-defined constants that identify compilation target, and potentially other things.

When using the availability of these constants with the Codepaths functionality, it is usually more than sufficient to AURA subsystems that are able to auto-configure themselves for a given platform, and allows the end-user to configure the package further by editing the configuration package directly, on a per-project basis.

I have plenty examples of actual subsystems that use autoconfiguration in the “ASAP” repository, and some of these are part of walk-through examples in the AURA documentation.

For anyone who want to give this a try, I’d love to have more feedback, and for this thread specifically, I wonder if this concept has any legs at all as far as actual standardization goes?

sttaft commented 3 months ago

Interesting! AURA might be a good candidate for an ISO Technical Specification, now that ISO has again allowed these to be openly available.

sttaft commented 3 months ago

On Thu, Jul 18, 2024 at 12:58 AM ARG-Editor @.***> wrote:

Tucker wrote:

What you are suggesting of having one spec and multiple bodies is supported well already by most Ada compiler systems. The exact mechanism depends on the implementation, but it is definitely the "Ada way" and there are various ways to accomplish it with most existing Ada systems.

Surely. I covered that in the problem overview part, as "Project Management Techniques". The problem, of course is that these techniques are vendor-specific -- moving portable Ada code from one vendor to another (for whatever reason) requires completely rebuilding this sort of management.

Another problem is that these sort of techniques require a lot of code duplication: even if there is only a single routine that needs to be different for a target, the entire package needs to be duplicated. One can work around that to some extent by having more packages, but at some point you overwhelm the mind-space.

My response was not directed at the Ghost Code suggestion, which I think is a good idea on its own.

My comment was directed rather at the suggestion for multiple bodies for the same spec, but Richard Wai has given an interesting existence proof with his AURA system, and certainly the package manager ALIRE ( https://alire.ada.dev/) also is seeing wider use as well. Unlike the Ghost code proposal, which is clearly a language proposal, these compilation-control systems seem more appropriate for a separate Technical Specification rather than incorporating them into the Ada RM.

-Tuck

...Message ID: @.*** com>

ARG-Editor commented 3 months ago

even if there is only a single routine that needs to be different for a target, the entire package needs to be duplicated.

If there's only a single routine that needs to be different, one could use a separate for the individual routine,

I didn't explicitly mention that possibility, we used it a lot in Janus/Ada. I probably should have said "unit" rather than "package" when I said:

One can work around that to some extent by having more packages, but at

some point you overwhelm the mind-space.

By the latter I mean that the more separate units you have, the more complex the system architecture becomes, the harder it is to find the correct entity when maintaining software, and the harder it is to find the associated source code (especially when there are several versions). One can use extra-language mechanisms to help with these things, but the human mind can only juggle so much.

...

Different Ada vendors have different ways to set up project files, ...

...which is the crux of the issue. If you use vendor-specific tools to manage versions, then you are pretty much stuck with that vendor. It is painful to move such a system (especially one with many interlocking options) to some other system.

The whole point of having standards is to prevent being stuck with a single vendor. If "portable Ada code" isn't enough to handle some problem area, then we owe it to users to at least consider potentially standardizing a vendor-independent method.

It's also pretty clear that unit level source versioning isn't helpful for the problems that originated this issue. (Remember that managing source files is NOT the issue here; it's complementary to that issue.) That is the problem of separating versions for analysis, for debugging, and for production use (where the latter cannot have unused code in safety critical systems).

In particular, it makes no sense at all to analyze a set of source code for correctness, but then turn around and use a completely different set of source code for the production version. You learned very little from the analysis if the final source is different!

Similarly, if you debug a set of source, but then deliver a program constructed from a different set of source, can you really be sure that you fixed anything? The ghost code mechanism is intended to deal with these sorts of cases, where code specifically for debugging and/or analysis can be excluded from the final version, without any modification to the actual source code.

I rather wish that this issue hadn't gotten hijacked by project management discussions, but these sort of forums are like herding cats -- there is only a limited amount of control that one has over the direction of discussions.

                   Randy.
ARG-Editor commented 3 months ago

Richard wrote:

This annex would define a set of behaviors that could optionally be implemented by an Ada compiler. I named this notional annex the "Ada User Repository Annex" (AURA).

...

For anyone who want to give this a try, I'd love to have more feedback, and for this thread specifically, I wonder if this concept has any legs at all as far as actual standardization goes?

One immediate thought is that it would be good to try this with more than one compiler in order to avoid getting bogged down in GNATisms. And of course I have special interest in one particular non-GNAT compiler. :-)

I wonder if your reference implementation could be retargeted to another compiler via the command line of that implementation. (Best, of course, would be for that implementation to be easily retargeted to any compiler, preferably to by an end user, at which point it becomes compiler-independent and usable in a portable way. But that might be asking too much.)

I saw a lot of discussion of source code in your write-up, but I didn't see much about the other end (output). I've typically kept separate repositories for debugging and production versions of the compiler, so I can switch between them without having to do a complete rebuild. (On my newest machines, that's down to under an hour, unlike the entire day years ago, but still, no one wants to wait an hour to find the answer to a problem.) Does AURA have some way to configure the output?

We probably should take this off-line rather than polluting this topic. E-mail me when you have time (I should be retired by then and have more time myself! :-) :-)

            Randy.
evanescente-ondine commented 6 days ago

I have a question regards to conditional code too.

The Constrained attribute is permitted for objects of generic types. The result indicates whether the corresponding actual is constrained.

Does that mean we could see specialization as in C++ templates, if the compiler uses techniques to share code between generic instances selectively ? So we could see generic libraries tailored depended for specific use cases without weighting on the object code's size.