Pattern Matching as an Ada Language Feature

This issue relates to the recent work by Randy and Steve on how to categorize the "held" AI12 AIs. AI12-0214-1 and AI12-0274-1 pertain to extending the "case" statement to handle composite types, or more general pattern matching. Below is a link to a paper I wrote for the SIGAda HILT 2022 Workshop on whether to extend the Ada "case" statement to handle more general pattern matching. It might prompt some thinking about this topic. I have circulated this internally at AdaCore, already.

I have also included a link to the associated powerpoint presentation.

Take care, -Tuck

Paper on pattern matching: https://drive.google.com/file/d/1_zRADMhsZKS2z9sHMevdjDgb3WZKCvWW/view?usp=share_link

Presentation on pattern matching: https://drive.google.com/file/d/1Bzbok48NbrPcfB-2FwAU-d1GpSmHcsR3/view?usp=share_link

Here is a copy of the Introduction to the paper:

Many programming languages now include a pattern-matching feature, often introduced with the keyword match, e.g. OCaml[1], Python[2], Haskell[3] (Haskell doesn't require any keyword -- every function is considered a pattern match). These are not primarily focused on string pattern matching, but more on structure pattern matching, where the matching starts from an object of some structured type, and the individual patterns select particular structural patterns for specified actions.

These pattern matching features can be seen as a generalization of the case or switch statement available in most third-generation programming languages. But they typically include the ability to associate an identifier with some or all of the pattern, which is then usable inside the handler for the given pattern, knowing that that identifier refers to some part of the original object that satisfies the given part of the pattern.

One of the great benefits of a pattern matching language feature is that it can be used to implement logic that otherwise might require a long if/elsif/elsif/.../else chain. The pattern-matching equivalent to such a chain will generally be easier to read, understand, and maintain. Furthermore, it is possible to impose additional rules on pattern matching (including when it is something as simple as a conventional switch/case construct) that will foster more rigorous software development processes, and thereby allow more complete program verification at compile time. Here are the three most important such properties:

Complete/Exhaustive -- ...
Unambiguous -- ...
Nonredundant -- ...

Here are some comments from an ARG email chain on this topic:

Randy.

So to follow up on responses to my concerns on this feature above, I apologize for using the admittedly polarizing "fad" terminology. I think this might have distracted my primary argument.

I recognize that pattern matching is not new, and I also appreciate that the boundaries between paradigms are not well-defined, or even perhaps real from a technical view. However I think they are extremely distinct from a conceptual view. To me this becomes an issue of culture and the art of programming more than the theory and science.

The conceptual paradigm of functional programming applies natural constraints to the approach for solving a given computational problem. That means the very foundation of a given approach might be wildly different than a procedural or object-oriented approach to the same problem. Naturally some individual programmers, industry segments, or academic specialties, might be more more or less accustom to certain approaches, and therefore certain comfortable frames of thinking.

I agree that mixing paradigms and creating a kind of "dialogue" between diverse frames of thought, and approaches to problems, is good for everyone. That alone is what puts me "on the fence" wrt this proposal. It is hard to deny that it will open-up new approaches to problems that are hard to foresee.

To me pattern matching really arose generally from the needs of a functional approach to complex computational problems, particularly involving structured data or deep logic hierarchies. These same problems can be solved efficiently and readably with mature and rich tools of abstraction, data modeling, and state management. That approach tends to be more distributed, whereas the functional approach seems to be more centralized (ergo a pattern-matching case statement).

What I really take issue with is the idea that adding pattern matching to Ada "solves" a problem, particularly if it is suggested the problem is complexity of logic. To me the solution to that problem ought to be by using the existing tools of abstraction, data modeling, and object-orientation.

Therefore without a strong "need" for this feature, I have to wonder if the complexity of implementation is worth it.

My experience is that conditional logic inevitably becomes more complex during maintenance and enhancement, as a program tries to optimize various special cases. My belief is that pattern matching, so long as the three "desirable properties" are enforced at compile-time, is perhaps the best way to handle this inevitably growing complexity. The example in the paper was not a strawman -- it was from real production code used in our static analysis tool.

A static analysis tool is just one example of a case where you are always trying to do a better job, and the logic almost inevitably becomes more and more tortured. You can always say that a better programmer will do a better job, but I happen to think that a better language can make the best programmers even better, by providing better structuring mechanisms, and more compile-time guarantees. So I believe that the improved compile-time checking possible with pattern matching fits exactly into Ada's "sweet spot" where the programmer and the compiler work together to produce a safe, correct, and efficient program.

I definitely appreciate the application in static analysis, and that is probably the only aspect that really attracts me. I'll agree with the general experience of long-term maintenance, but I do not see how pattern matching, as proposed, saves us from the problem at all. The pattern matching syntax itself seems susceptible to unmanaged complexity and extreme unreadability. We already risked this a bit with executable contracts, but the pattern matching syntax takes that to a whole new level. I think at the end of the day that is what worries me the most.

Whenever you have a long chain of if/elsif/elsif/... maintenance becomes a challenge, because there is no easy way to know whether at a given point in the chain, does the logic rely on the fact that all prior conditions were false, or could you safely re-order the "arms" of the if/elsif without affecting correctness? Furthermore, you end up thinking that performance might be improved if you shift which conditions are tested first, again without being sure it will break things. Finally, when it comes time to add a new condition, where is the "right" place to add the check?

With the proposed pattern matching rules, any legal ordering has the same semantics. Although I didn't talk a lot about performance, the intent is that multiple uses of the same query function (with matching parameters, if any) in a pattern never result in more than one actual call, so you don't need to worry about the order from a performance point of view either. Those two guarantees are by themselves a big step up from if/elsif chains, in my view. When completeness checks are added to the story, it becomes pretty compelling to me.

I'm thinking it would really make the case for this proposal if we could provide a few more really practical/common and maximally simple patterns that would benefit from pattern matching. The idea that we can have a much more rich coverage of conditions that the compiler can "prove" is full coverage is indeed very alluring. The red-black tree balancing example in your paper is definitely useful in general, but it's probably not something that the average programmer would find immediately relevant in their day-to-day lives, but I suspect other examples could be.

I definitely want to try my hands at this.. I think its an area I need to get more familiarity with.

Two more comments from ARG mailing list:

Raphaël Amiard via ada-auth.org Mon, Nov 21, 5:02 AM (2 days ago)

(Aside: We got plenty of push-back for conditional expressions and quantified expressions, continuing much further in this direction is a place many existing Ada users don't want to go.)

It would be really helpful if you restrained yourself from posing as the voice of the Ada community and making those kind of blanket statements. For every user that pushed back on those features I can probably find as many that love those features. If you want us to have a real debate on features that would be a prerequisite as far as I'm concerned.

Randy Brukardt via ada-auth.org Nov 22, 2022, 9:41 PM (13 hours ago)

Raphaël Amiard writes:

(Aside: We got plenty of push-back for conditional expressions and quantified expressions, continuing much further in this direction is a place many existing Ada users don't want to go.)

It would be really helpful if you restrained yourself from posing as the voice of the Ada community and making those kind of blanket statements.

I consider it part of my job as Editor to listen to the voices in the community (something I do on my own time, BTW). Moreover, as the most visible ARG member, I probably receive somewhat more communications than anyone else. I will report on those when it is appropriate; I try to do so in a factual manner. If that comes off as "the voice of the community", it's not my intent to sound like everyone agrees with a particular opinion.

For every user that pushed back on those features I can probably find as many that love those features. If you want us to have a real debate on features that would be a prerequisite as far as I'm concerned.

It's not just the pushback on the conditional expressions, but also the commonly expressed desire that we not go further in this direction. I've heard a number of people say that they don't want Ada to turn into a functional language (and quite a few more than expressed concern about conditional expressions). I didn't put that into my original statement because it seemed too vague to report on (what exactly is the concern is hard to pin down).

As with everything, this is a data point; surely not the last word on anything. I think that many people who have pushed back do see the value of conditional expressions; it's certainly OK to push beyond some of the community's comfort zone. But one always needs to think twice when extending the language in a direction that at least a vocal part of the community has expressed reservations on.

Richard wrote:

I'm thinking it would really make the case for this proposal if we could provide a few more really practical/common and maximally simple patterns that would benefit from pattern matching.

I agree that examples are critical to evaluate the benefit of proposed enhancements. Below is an example of a relatively tortured if/elsif tree, which happens to be from an interactive debugger written in ParaSail. I'll spend some time thinking about how a pattern-matching feature might or might not help in such a case, as a challenge... ;-)

         if Type.Is_Small() then
            const Int_Val : optional Univ_Integer :=
              Stack_Frame_Info::Peek_At_Address(Base, Offset)
            if (Type.Null_Value_For_Type() not null
                   and then
                Int_Val not null
                   and then
                Type.Null_Value_For_Type() == Int_Val)
              or else
                (Type.Null_Value_For_Type() is null
                   and then
                 Int_Val is null)
            then
               //  This is the null value for the type.
               Print(Indent * " " | Prefix | "null");
            else
               case Type.Type_Kind() of
                [#normal] =>
                  //  Print in hex (these are typically "faked" types
                  //  which are actually integers internally).
                  Fallback();
                [#univ_integer] =>
                  const Val : Univ_Integer := 
                    Stack_Frame_Info::Peek_At_Address(Base, Offset);
                  Print(Indent * " " | Prefix | Type_Name | "::(`(Val))");
                [#univ_real] =>
                  const Val : Univ_Real := 
                    Stack_Frame_Info::Peek_At_Address(Base, Offset);
                  Print(Indent * " " | Prefix | Type_Name | "::(`(Val))");
                [#univ_enum] =>
                  const Val : Univ_Enumeration :=
                    Stack_Frame_Info::Peek_At_Address(Base, Offset);
                    Print(Indent * " " | Prefix | Type_Name | "::`(Val)");
                [#univ_char] =>
                  const Val : Univ_Character :=
                    Stack_Frame_Info::Peek_At_Address(Base, Offset);
                    Print(Indent * " " | Prefix | Type_Name | "::\'`(Val)\'");
                [#basic_array] =>
                  {*"arrays are not small"* #false}
                  //  Print in hex
                  Fallback();
                [#univ_string] =>
                  {*"strings are not small"* #false}
                  //  Print in hex
                  Fallback();
               end case;
            end if;
         elsif Type.Type_Kind() == #univ_string then
            const Val : optional Univ_String :=
              Stack_Frame_Info::Peek_At_Address(Base, Offset);
            Print(Indent * " " | Prefix | Type_Name | "::\"`(Val)\"");
         elsif Indent > Indent_Limit or else Line_Count > Line_Limit then
            //  Don't show a large object beyond this indent
            Print(Indent * " " | Prefix | Type_Name | "::(...)");
         elsif Type.Type_Kind() == #basic_array then
            //  Basic Array
            const Type_With_Params := (if |Type.Parameters()| > 0 then Type
                                       else Type.Enclosing_Type());
            const Comp_Type := Type_With_Params.Parameters()[1].Data.Type_Desc;
            const Comp_By_Ref : Boolean := #false;
            const Obj_Base : optional Univ_Integer :=
              Stack_Frame_Info::Peek_At_Address(Base, Offset);
            const Max_Array_Len := 10;  //  Max # of array components to show

            if Is_Large_Null (Obj_Base) then
               Print(Indent * " " | Prefix | "null");
            elsif Is_Bad_Address (Obj_Base) then
               Print(Indent * " " | Prefix |
                 "has bad address `(Hex_Image(Obj_Base))");
            else
               const Len : Univ_Integer :=
                 Stack_Frame_Info::Peek_At_Address(Obj_Base, 1);

               Println(Indent * " " | Prefix | Type_Name | "::[");
               Line_Count += 1;
               for I in 1 .. Len forward loop
                  Display_One_Obj (Obj_Base, I + 1, Comp_By_Ref, Comp_Type,
                    Indent => Indent + 2,
                    Line_Count => Line_Count,
                    Line_Limit => Line_Limit,
                    Indent_Limit => Indent_Limit);
                  Println("");
                  Line_Count += 1;
                  if I < Len
                    and then
                      (I == Max_Array_Len
                         or else Line_Count > Line_Limit)
                  then
                     //  Too many components to show
                     Println(Indent * " " |
                       "  ... // total of `(Len) components");
                     Line_Count += 1;
                     exit loop;
                  end if;
               end loop;
               Print(Indent * " " | "]");
            end if;
         else
            //  Normal large object
            {*"must be normal kind"* Type.Type_Kind() == #normal}
            const Obj_Base : optional Univ_Integer :=
              Stack_Frame_Info::Peek_At_Address(Base, Offset);

            if Debug then
               Println("Peek(`(Base), `(Offset)) = `(Obj_Base)");
            end if;

            if Is_Large_Null (Obj_Base) then
               Print(Indent * " " | Prefix | "null");
            elsif Is_Bad_Address (Obj_Base) then
               Print(Indent * " " | Prefix |
                 "has bad address `(Hex_Image(Obj_Base))");
            elsif Is_Polymorphic(Type) then
               //  Polymorphic object, just display indented
               //  First need to get "actual" type of polymorphic
               //  object, since that has the "actual" type of the
               //  enclosed object.
               const Poly_Header : optional Univ_Integer :=
                 Stack_Frame_Info::Peek_At_Address(Obj_Base, 0);
               const Type_Index := Poly_Header mod 2**48 / 2**32;
               const Poly_Type : optional Type_Descriptor :=
                 Type_Desc_At_Index(Type_Index);

               Println(Indent * " " | Prefix | Type_Name | "::(");
               Line_Count += 1;
               if Poly_Type is null then
                  Println(Indent * " " |
                    "  Poly Type Index #`(Type_Index) invalid");
               else
                  Display_One_Obj
                    (Obj_Base, 1, Poly_Type.Components()[1].Is_By_Ref,
                      Poly_Type.Components()[1].Type_Desc,
                      Indent => Indent + 2,
                      Prefix => "",
                      Line_Count => Line_Count,
                      Line_Limit => Line_Limit,
                      Indent_Limit => Indent_Limit);
               end if;
               Println("");
               Line_Count += 1;
               Print(Indent * " " | ")");
            else
               //  Non-polymorphic, non-null normal "large" object
               const Components := Type.Components();
               const Decl_Of_Type := Type_Decl(Type);
               const Type_Region := Decl_Of_Type not null?
                 Decl_Region(Decl_Of_Type) : null;
               const Num_In_Type_Region := Type_Region not null?
                 Num_Items(Type_Region) : 0;

               --  Look for useful operations on type
               --  TBD: Use these to produce nicer output.
               const To_String_Op :=
                 Find_Op_Of_Type (Type_Region, "to_string");
               const Index_Set_Op :=
                 Find_Op_Of_Type (Type_Region, "\"index_set\"");
               const Remove_First_Op :=
                 Find_Op_Of_Type (Type_Region, "Remove_First");
               const Remove_Any_Op :=
                 Find_Op_Of_Type (Type_Region, "Remove_Any");
               const Indexing_Op :=
                 Find_Op_Of_Type (Type_Region, "\"indexing\"");

               if Decl_Of_Type is null then
                  Println("Type_Decl(`(Type.Name())) is null");
               elsif Type_Region is null then
                  Println("Type region is null for `(Id(Decl_Of_Type))");
               end if;
               Println(Indent * " " | Prefix | Type_Name | "::(");
               Line_Count += 1;
               for (each C of Components;
                    Comp_Offs in 1 .. |Components|)
                 forward loop

                  var Comp_Prefix := "";

                  for Item_Index in 1 .. Num_In_Type_Region loop
                     const Decl_In_Region :=
                       Nth_Item(Type_Region, Item_Index);

                     if Kind(Decl_In_Region) == #object then
                        const Comp_Index := Component_Index(Decl_In_Region);

                        if Comp_Index not null
                          and then [[Comp_Index]] == Comp_Offs
                        then
                           Comp_Prefix := Id(Decl_In_Region) | " => ";
                           exit loop;
                        end if;
                     end if;
                     //  Keep looking for matching component decl
                  end loop;
                  if Debug and then Comp_Prefix == "" then
                     Println
                       ("No component matching `(Obj_Base)[`(Comp_Offs)]");
                  end if;

                  Display_One_Obj
                    (Obj_Base, Comp_Offs, C.Is_By_Ref, C.Type_Desc,
                      Indent => Indent + 2,
                      Prefix => Comp_Prefix,
                      Line_Count => Line_Count,
                      Line_Limit => Line_Limit,
                      Indent_Limit => Indent_Limit);
                  Println("");
                  Line_Count += 1;
               end loop;
               Print(Indent * " " | ")");
            end if;
         end if;

Just a note to repeat that this issue holds the feature request for the unfinished topic from Ada 2022 (in this case AI12-0214-1 [a simple version extending case statements to simple records] and AI12-0274-1 [a version more like the one Tucker describes here]. This fulfills the ARG resolution of November 10, 2022 to have an issue created for this topic.

The paper submitted to the HILT workshop about pattern matching as a language feature has been expanded in anticipation of submitting it to the Springer "Journal of Software Tools for Technology Transfer". Here is the Google drive link:

https://drive.google.com/file/d/14SMQLsMdRKbsmgBz5Je1l9zFRx85Rdce/view?usp=sharing

It includes a few more examples, and a description of a reusable library that can be used to perform the compile-time checks proposed in the paper, for exhaustiveness, ambiguity, and redundancy.

This is copied from Issue #56, by https://github.com/Blady-Com. I will close #56 to avoid splitting the discussion.

Here are some comments on Rigorous Pattern Matching as a Language Feature: 2.2 Proposed Pattern Matching Syntax
simple_pattern ::=
[identifier :] unlabeled_pattern
| <identifier>
might be
simple_pattern ::=
[<identifier> :] unlabeled_pattern
| <identifier>
in oder to have a same syntaxe in both cases, for instance : <Val>:(0 .. 100 with mod 2 => 0, mod 3 => 1)
unlabeled_pattern ::=
<>
| static_expression
| static_range
| subtype_mark
| property_pattern
| sequence_pattern
| map_pattern
might be
unlabeled_pattern ::=
<>
| static_expression
| static_range
| subtype_mark
| property_pattern
| sequence_pattern
| map_pattern
| [not] null
in order to match accesses for instance:
case Panel, Button is
  when not null, not null => Panel.Toggle (Button);
  when not null, null => Panel.Create (Button);
  when others => raise GUI_Error;
end case;
sequence_pattern ::=
[ pattern [*|+]{,
pattern [*|+]} [, others => pattern] ]
might be
sequence_pattern ::=
[ pattern [*|+]{,
pattern [*|+]} [, others => pattern [+]] ]
in order to specify that the others part shall match at least once for instance: [’0’, ’x’ | ’X’, others => ’0’..’9’ | ’a’ .. ’f’ | ’A’ .. ’F’ +]

Here is a further update to the paper, which now covers the issue of generating good error messages when one of the three "desirable properties" is violated:

https://drive.google.com/file/d/1_1e-FDUcmP5-IcYNenl-StbDdnzyng3t

Note: The above document has been updated further (as of 7-Feb-2024) to incorporate reviewer comments.

Ada-Rapporteur-Group / User-Community-Input

Pattern Matching as an Ada Language Feature #34