GaloisInc / cryptol

Cryptol: The Language of Cryptography

https://galoisinc.github.io/cryptol/master/RefMan.html

BSD 3-Clause "New" or "Revised" License

1.13k stars 122 forks source link

Sum types #1588

Closed RyanGlScott closed 6 months ago

RyanGlScott commented 9 months ago

I propose that we add native support for sum types to the Cryptol language. This would be of particular use in two places:

When specifying algorithms that can possibly fail, one must currently indicate this with a return type that resembles ([32], Bit), where the Bit is True if the algorithm succeeded and False otherwise. This is clunky, however, since one must still provide a return value of type [32] in the False case, even though that return value is useless. What's worse, depending on how the algorithm is defined, returning a partial result of type [32] might potentially leak sensitive information that was provided to the algorithm.

Both of these issues could be fixed by changing the return type to something like Option u32, where Option is a sum type that has one variant with a single field (in this case, of type u32) to represent success, and another variant with no fields at all to represent failure. This way, the failure case does not reveal anything about the inputs at all.
In SAW's MIR backend, it would be convenient to embed MIR enum values (which are effectively sum types themselves) into Cryptol and back. There isn't a particularly clean way to do this in general, aside from encoding sum types using some sort of Cryptol primitives. Having native sum types as part of Cryptol would make this process much smoother.

Having discussed this on @yav, we are both of the opinion that sum types would be useful to add, but we would first need to agree upon a particular design. Here are some specific design questions that would need to be answered:

Should sum types be nominal or structural? That is, should we implement tagged unions (nominal), perhaps resembling this Haskell-inspired syntax?
```
data Foo
 = Bar [32]
 | Baz [64]
```
Or should we implement anonymous variants (structural), similar to how Cryptol's records are anonymous?
```
type Foo = ( [32] | [64] )
```
@yav believes nominal sum types would be somewhat easier to implement, but also that structural sum types may be a better fit with the design of Cryptol.
If we implement nominal sum types, should we permit recursive types, such as in this example?
```
data List
 = Nil
 | Cons [32] List
```
@yav is inclined not to allow this.
If we implement structural sum types, should we permit subtyping? That is, should a value of type ( [32] | [64] ) be permitted to be used in a function that expects an argument of type ( [32] | [64] | [128] )? Furthermore, should we permit polymorphic variants such as ( [32] | [64] | ... ), which indicate that other variants are permitted provided that ( [32] | [64] ) are included?

Again, @yav is inclined not to allow subtyping or polymorphic variants. That is, do the simplest possible thing of requiring the sum types to line up exactly as written.
If we implement structural sum types, how should they interact with type inference? Suppose that a user writes this (I will invent a syntax for introducing values within particular variants):
```
f b = if b then ( 27 : [32] |) else (| 42 : [64])
```
What should the return type of f be? It could reasonably be ( [32] | [64] ), ( [32] | [64] | [128] ), or, if variant polymorphism is permitted, something like ( [32] | [64] | ... ).

@yav suggests inferring the smallest set of variants used by a function, which would mean that Cryptol would infer the type Bit -> ( [32] | [64] ) for f.
What should the pattern-matching mechanism be? (https://github.com/GaloisInc/cryptol/issues/1588#issuecomment-1812719741 is one possibility.)

weaversa commented 9 months ago

I tried to focus on the first case (how to deal with failure) and came up with the following based on the structural variant:

FAIL = (()|)
PASS v = (|v)

type FAIL a = (() | a)

g : {a, b} a -> (FAIL b)
g x = if some_error then FAIL else PASS good_value
  where
    some_error = ...
    good_value = ...

h {a, b} a -> b
h x = ...
  where
    r = g x
    b = fail r : Bit
    v = pass r : b
    ...

How would one coalesce a sum type? In the example above, this is relevant to implementations of fail and pass. Am I on the right track even?

RyanGlScott commented 9 months ago

What do you mean by "coalesce a sum type"?

Judging from what the type signatures for fail and pass would be:

fail : {b} FAIL b -> Bit
pass : {b} FAIL b -> b

In order to implement this, I think would would need to introduce a mechanism to pattern-match on a sum type. (I left that out of my original comment in https://github.com/GaloisInc/cryptol/issues/1588#issue-1991385756 to reflect this.) Here is one possible syntax that would work for structural sum types, inspired by OCaml:

fail : {b} FAIL b -> Bit
fail r =
  match r with
  | (_ |) => True
  | (| _) => False

pass : {b} FAIL b -> b
pass r =
  match r with
  | (_ |) => error "Expected PASS, encountered FAIL"
  | (| x) => x

weaversa commented 9 months ago

Thanks for the possible syntax -- Would there be some way to work this into the same notation as numeric constraint guards? After looking at your proposal, the two feel very similar to me.

RyanGlScott commented 9 months ago

Possibly. Currently, the notation of numeric constraint guards only allows writing constraints after each |, and the syntax for constraints is quite different when compared to the syntax for patterns (which can, among other things, bind new variables). One idea would be to generalize the numeric constraint guard notation a bit to make it obvious when patterns are expected. Perhaps something like this syntax, which is inspired by Haskell's PatternGuards feature:

fail : {b} FAIL b -> Bit
fail r | (_ |) <- r => True
       | (| _) <- r => False

pass : {b} FAIL b -> b
pass r | (_ |) <- r => error "Expected PASS, encountered FAIL"
       | (| x) <- r => x

I haven't thought deeply about what the consequences of generalizing numeric constraint guard notation in this way would be. (My use of the match keyword in https://github.com/GaloisInc/cryptol/issues/1588#issuecomment-1812719741 was intended to make the syntax easier to distinguish in the parser.)

weaversa commented 9 months ago

Perhaps it would be helpful to have a concrete example. KW-AD is defined on page 14 of NIST SP 800-38F. The algorithm includes a failure condition and is paraphrased here:

Algorithm 4: KW-AD(C) Input: purported ciphertext, C, with valid length. Output: Plaintext P or indication of inauthenticity, FAIL. Steps:

Let ICV1 = 0xA6A6A6A6A6A6A6A6.

Let S = W'(C).

If MSB(S) ≠ICV1, then return FAIL and stop.

Return P = LSB(S).

For purposes of figuring out how to fail gracefully, here is a partial implementation of KW-AD that elides the definition of W'.

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> (Bit, [n * 64])
KWAD CIPHk' C = (FAIL, P)
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

I think this exhibits the behavior we're trying to avoid with this sum types proposal, namely, returning a failure flag along with some partial plaintext (KWAD CIPHk' C = (FAIL, P)) when the specification instead says to "return FAIL and stop". This demonstrates the potential to leak sensitive information. However, let's note that this style supports both positive and negative testing.

One way to meet the intent of "return FAIL and stopusing current Cryptol mechanisms would be to use theundefined` keyword:

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> (Bit, [n * 64])
KWAD CIPHk' C = if FAIL then (True, undefined) else (False, P)
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

This style encourages developers to fail cleanly, and also supports both positive and negative testing, and I'm proposing that this style meets the intent of @RyanGlScott's first bullet. I'm not sure saw always plays well with undefined, but let's focus on Cryptol for now.

Ideally, we'd want something like the following:

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> [n * 64]
KWAD CIPHk' C = if FAIL then error "Stopping due to decrypt failure" else P
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

This style avoids the use of a failure flag by stopping Cryptol execution on error, encourages clean failure, but does not permit negative testing (as Cryptol and SAW have no way to 'catch' failure). I'd like to hear some ideas on how sum types could be used to make this last example better. Any thoughts would be appreciated.

RyanGlScott commented 9 months ago

This code is a perfect example of (1) from https://github.com/GaloisInc/cryptol/issues/1588#issue-1991385756, where failure is awkwardly represented with a Bit. A less awkward way to encode this example would be to change the return type to an option type. Here is how one might do this with a nominal Option type:

data Option (a : *)
  = Some a // Indicates success, along with a payload of some type `a`
  | None   // Indicates failure, no payload

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> Option [n * 64]
KWAD CIPHk' C = if FAIL then None else Some P
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

Alternatively, here is how one might do this with an anonymous sum type:

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> ( () | [n * 64] )
KWAD CIPHk' C = if FAIL then ( () |) else (| P )
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

(Syntax subject to change, of course.) Importantly, the failure case no longer has to use undefined or error—instead, it returns a variant that encodes no information about the inner workings of KWAD.

weaversa commented 9 months ago

Will you provide an example of a function that calls each of your KWAD?

weaversa commented 9 months ago

I'm feeling like we can get the same thing as nominal Option types today.

type Option a = (Bit, a)
None = (True, undefined)
Some a = (False, a)

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> Option [n * 64]
KWAD CIPHk' C = if FAIL then None else Some P
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

weaversa commented 9 months ago

If undefined is troublesome, we could use zero.

type Option a = (Bit, a)
None = (True, zero)
Some a = (False, a)

W' = zero
MSB = take`{64}
LSB = drop`{64}

KWAD :
    {n}
    (fin n, 2 <= n, n < 2^^54) =>
    ([128] -> [128]) -> [(n+1) * 64] -> Option [n * 64]
KWAD CIPHk' C = if FAIL then None else Some P
  where
    ICV1 = 0xA6A6A6A6A6A6A6A6
    S    = W' CIPHk' C
    FAIL = MSB S != ICV1
    P    = LSB S

RyanGlScott commented 9 months ago

Will you provide an example of a function that calls each of your KWAD?

I've never used KWAD before, so I'm not sure if there is an idiomatic way to use the results. I'll arbitrarily pick an example where you call KWAD, and if it is successful, chunk up the [n * 64] into a [n][64], and if it is not successful, it returns zero. In order to accomplish this, you'd need to use pattern matching. An example using a nominal sum type:

KWAD_res = KWAD ...

split_KWAD : {n} Option [n * 64] -> [n][64]
split_KWAD KWAD_res =
  match KWAD_res with
  | None     => zero
  | Some res => split res

An example using a structural (anonymous) sum type:

KWAD_res = KWAD ...

split_KWAD : {n} ( () | [n * 64] ) -> [n][64]
split_KWAD KWAD_res =
  match KWAD_res with
  | ( () |)  => zero
  | (| res ) => split res

I'm feeling like we can get the same thing as nominal Option types today.

Certainly, I didn't mean to suggest that it is impossible to define KWAD today without sum types. As your code shows, we can also define KWAD by using a Bit to indicate the possibility of failure, possibly using undefined/error in the event that KWAD fails to return a meaningful result. The problem with this approach is that it is clunky:

If you want to prove any specifications involving KWAD using SAW, you have to deal with the possibility of undefined showing up, which would cause simulation to fail. Ideally, we can carefully set things up such that SAW would never evaluate the payload (which could be undefined) before first checking the Bit, but this is hard to guarantee in general. (See here for an example where this has gone wrong in SAW.)
The encoding conventions can become even clunkier when more payload values are involved. For instance, suppose that we changed KWAD has three possible outputs:
- If the input ciphertext C is inauthentic, FAIL.
- If the input ciphertext C is authentic but has an invalid length, return an integer representing the invalid length.
- If the input ciphertext C is authentic and has a vaild length, return a plaintext P.
(This is a somewhat contrived change, but bear with me for a bit.)

If we had sum types, we could encode these three possible outputs with this nominal sum type:
```
data KWADResult (n : #)
 = Inauthentic
 | InvalidLength [64]
 | Valid [n * 64]
```
Or this structural (anonymous) sum type:
```
( () | [64] | [n * 64] )
```
If we didn't have sum types, we'd need to find a way to encode this in Cryptol's current type system. We can't use a Bit for this purpose, since there are more than two possibilities. We'd have to use something like an Integer, where perhaps 0 represents an inauthentic input, 1 represents an invalid length, and 2 represents a valid input. That is, we'd need to use a type like this:
```
(Integer, [64], [n * 64])
```
Where each possible result is encoded as:
```
INAUTHENTIC      = (0, undefined, undefined)
INVALID_LENGTH l = (1, l,         undefined)
VALID p          = (2, undefined, p)
```
Not only do we have to remember the convention of which Integer maps to which possible result, but we also have to remember which parts of the tuple might be undefined. The more fields you add to the result type, the more possible undefineds you introduce.
Using a result type such as:
```
(Integer, [64], [n * 64])
```
Introduces the possibility of unrepresentable states. For instance, the type above permits a value like this:
```
(3, undefined, undefined)
```
But this is meaningless with the encoding above, since the Integer is intended to only ever be in the range [0, 2]. Another unrepresentable state is:
```
(0, zero, zero)
```
This is meaningless in the encoding above because if the Integer is 0, then the other parts of the tuple are meant to be undefined, but that expectation isn't upheld.

Nevertheless, we are forced to check for unrepresentable states in certain places when using this encoding. For example, how would we write a function like this?
```
use_KWAD_another_way : (Integer, [64], [n * 64]) -> [n][64]
```
We would have to check if the Integer is in the range [0, 2]:
```
use_KWAD_another_way (i, invalid_len, p) =
 if i == 0 => ...
  | i == 1 => ...
  | i == 2 => ...
```
But even after doing this, our function still does not cover all possible cases! Due to the possibility of unrepresentable states, it is possible for i to be something like 3. If we want to be absolutely sure that our function covers all possible cases, we must add a case like this:
```
use_KWAD_another_way (i, invalid_len, p) =
 if i == 0 => ...
  | i == 1 => ...
  | i == 2 => ...
  else        undefined
```
Now we've introduced yet another undefined in our program, and all of the potential pitfalls that that implies.

Sum types are great because they make it straightforward to capture the domain of your problem without introducing unrepresentable states. In a pinch, you can write your code without them, but not without introducing additional complications.

weaversa commented 9 months ago

@RyanGlScott Thanks for the discussion so far!

Right now I'm really liking the nominal sum type over the anonymous sum type. Thinking more about Rust and seeing the match notation with Some and None, I like it.

However, I'm not yet convinced that the addition of sum types will be beneficial. A DSL is great, in part, to the things left out of the language. Aside from the occasional failure condition, I'm having trouble thinking of cryptographic primitives that would benefit from sum types. More examples of crypto needing this feature would be helpful.

And...perhaps it's not Cryptol that needs sum types, but SAW. SAW is becoming more and more capable, and is now being used to verify programs that are outside the domain of cryptography. Is Cryptol's narrow scope of 'cryptographic primitives' becoming a liability on SAW's growth? Is it worth thinking about a DSL for systems program verification?

RyanGlScott commented 9 months ago

I'm having trouble thinking of cryptographic primitives that would benefit from sum types. More examples of crypto needing this feature would be helpful.

Most of the discussion above has been about challenge (1) in https://github.com/GaloisInc/cryptol/issues/1588#issue-1991385756. If that were the only challenge, then I'd agree with you that it wouldn't be worth adding sum types to Cryptol.

The thing that finally changed my mind was challenge (2), namely, the challenge of how to interface with Rust code in SAW. In Rust, it is idiomatic to write cryptographic primitives that can potentially fail using Result, a sum type with an Ok variant for successful results and an Err variant for failing results. As an example, this implementation of an AES-GCM encryption function in Rust returns a Result<Tag<TagSize>, Error>, where a successful computation returns an authentication tag (Tag<TagSize>) and a failing computation returns an error (Error) explaining what went wrong.

We are increasingly looking to use SAW to verify such Rust code. Part of this means that we will need to specify what the function returns using Cryptol values (e.g., mir_return {{ <cryptol-expr-here> }}). But what should go in place of <cryptol-expr-here> when it has type Result? If we didn't have sum types in Cryptol, then we'd have to put something there of type (Bit, Tag, Error), where the CryptolTag or CryptolError will be undefined depending on the value of the Bit.

But now we run into a problem: the Cryptol type (Bit, CryptolTag, CryptolError) is also used to represent the Rust type (bool, Tag<TagSize>, Error)! This means that if you wrote mir_return {{ x : (Bit, CryptolTag, CryptolError) }}, then SAW wouldn't have a clear way to determine whether you meant to encode x as a Rust tuple or encode it a sum type such as Result. We could potentially make SAW's type inference smarter and determine which one you meant by context, but this would likely be very confusing for SAW users.

This is ultimately what convinced me that life would be much easier if we had sum types in Cryptol. If we did that, then we wouldn't have to bother with these sorts of elaborate sum-type encoding tricks—we could just write the Cryptol specifications in a way that closely resemble the intentions of the original Rust code. This would greatly simplify both the internals of SAW as well as the user experience for users of the SAW Rust backend.

Is Cryptol's narrow scope of 'cryptographic primitives' becoming a liability on SAW's growth?

I don't see it that way. I largely agree that having a narrowly scoped DSL makes life better most of the time. And while it's true that Cryptol's scope is largely dictated by the needs of cryptography, that is enough to go a long way in practice.

I see the addition of sum types to Cryptol not as a sign that Cryptol's scope was too limited, but as a sign that the cryptographic community is increasingly embracing programming-languages techniques such as sum types. In this sense, I see sum types as a natural addition to Cryptol. Folks writing cryptographic code in Rust have adopted the convention of using sum types, and it's only natural for Cryptol to follow suit.

LeventErkok commented 9 months ago

Sum-types would be cool indeed. One aspect to consider is verification. SBV's SEither (https://hackage.haskell.org/package/sbv-10.2/docs/Data-SBV-Either.html) can be used to model sum-types. I'm sure What4 has something similar to offer.

Verification with sum-types is well supported by SMT solvers; especially since you don't plan to support recursive types. (If you allowed for recursion, most solvers would choke for most problems as they don't do induction.)

Nominal/Structural: Not much difference from verification perspective; though perhaps nominal would be a bit easier to translate back and from the symbolic world.

yav commented 7 months ago

The main use cases I am aware of are:

Writing explicitly partial functions (i.e., the Maybe type)
Writing SAW specifications for functions that use enumerations types (e.g., the Rust use case)
Writing specifications that model state transitions---sum types are quite nice for modelling the various states of a system.

We can kind of do this in the moment by representing sums with a product, and an index indicating which of the fields in the product is valid. This is quite clunky and a bit of a hack though.

Here is a summary of the design I have in mind:

Sum types have to be declared, and introduce a new named type, in a similar fashion to newyptes. For example:
```
enum Maybe a = Nothing | Just a
```
This is similar to Haskell, except that we use enum instead of data, and we do not support infix constructors, or labeled fields on sum types, and we allow constructors to be upper or lower case, like in current Cryptol.
To create a value of a sum type, we use the constructor name, but prefixed with a :. For example, some values are :Nothing, and :Just True. Constructors have to always be fully applied (see below on why), so one would have to use an explicit lambda to partially apply a constructor (for example \x -> :my2FieldConstructor True x).
To inspect a value of a sum type, we use case expressions, similar to Haskell. For example:
```
case e of
:Just x -> x
:Nothing -> True
```
Again, note that we use : to distinguish constructors from (like :Just) from bound variables (like x). We also add a new form of pattern :C p1 p2 for matching on sum types. This form of pattern may only appear in the branches of a case and may not appear in pattern bindings or function arguments, which would continue to only support irrefutable patterns.
The names of the enum types are handled by the module system just like for newtype. However, the names of constructors are not managed by the module system. Instead, they are treated more like the selectors of records, and we use the type system to associate constructor uses with their enum declarations. More precisely this works as follows:
- When we see a constructor use :C e1 e2 (or :C p1 p2), we generate a new form of constraint HasCon C t1 t2 tr. This constraint says that type tr has a constructor C with 2 fields, the first one being of type t1 and the second on of type t2 (this is a bit like the dual of the Has constraints we use for records).
- To solve a Has C ts t constraint we consult the enums that we have in scope (qualified or not):
  1. If t is of the form T xs, where T is an enumeration type we just resolve the constructor to belong to T (note that T need not be in scope!)
  2. Otherwise, we find all in-scope enums that have a constructor C whose fields could be instantiated to ts
  3. If there is only one such enum, then we can solve the constraint by unifying the fields and the type---we have resolved the constructor
  4. If there are multiple candidates, then we delay the solving of the constraint, in the hopes that further type inference will reduce the choices.
  5. Much like Has constraints, we never store HasCon constraints in schemas (i.e., these are invisible to the end user). If we end up with a constraint HasCons C ts t during generalization there are two possibilities: A) if t is to be generalized, then we report an ambiguity error, because there was not enough type information to resolve the constructor. Typically, a user would deal with this by writing a type annotation to specify which constructor they meant. B) If t is not to be generalized, the all type variables in ts should also not be generalized, and we delay the constraint to an outer scope. This is kind of like having a functional dependency t -> ts.

I am not 100% sure on what notation to use to distinguish constructors from normal functions, but I think it is a good idea to do so. Using : seems OK, but there might be a slight backward incompatibility in situations like f :T which now would mean apply f to constructor T, but previously would have meant f has type T. I am not sure how common it is, as usually the spacing in type annotations goes more like f: T or f : T both of which would still be OK, because the idea is to treat :T for constructors as single token.

RyanGlScott commented 7 months ago

Thanks, @yav! I'm broadly in support here, although I do have a couple of follow-up questions:

I am not 100% sure on what notation to use to distinguish constructors from normal functions, but I think it is a good idea to do so.

I'm not necessarily opposed to this idea, but I'm surprised that we'd need to prefix enum constructors with : but not newtype constructors. For instance, if you write newtype T = { unT : () }, then you can write T as though it were an ordinary function. Is there a reason to treat enum constructors differently in this regard?

Constructors have to always be fully applied (see below on why)

I'm not sure the rest of your comment ever explained this point—can you elaborate? There's also a discrepancy with newtypes here, since you could partially apply the T constructor from above.

yav commented 7 months ago

There are a two choices here:

The need to annotate constructors (i.e., the : or whatever other notation we need). This is particularly useful in patterns, so you can tell what's a constructor, and what's a pattern variable. Strictly speaking you don't need it (e..g, in ML they don't require it), but I find it leads to code that's hard to read, and not very robust, because adding a constructor to an enum may suddenly change a completely unrelated part of the program. In Haskell this distinction happens because constructors are capitalized and variables are not, but in Cryptol we don't have such a distinction, and I've seen people sometimes use capitalized names for variables... This has not been a problem with newtypes because we can't pattern match on them---instead we use a selector to access the fields.
Requiring constructors to be fully applied is only needed if we are going to disambiguate them based on the types, the way I describe in bullet 4 above. The alternative would be to manage constructor names with the module system, which is also a reasonable choice. I guess maybe that would be simpler? If we do that, constructors in expressions won't need any special handling, but I think we'd still need some special notation for constructor patterns.

So it might be a bit odd if use of constructors in patterns and expressions is a bit different, but I perhaps it makes sense.

Come to think of it, another benefit of just using the module system would be that we could allow public/private keywords in the enum to control which constructors are exported, which I guess could be useful on occasion.

yav commented 7 months ago

Actually, it occurs to me that we could adapt the Haskell capitalization convention also, like this:

We require that constructors in enum all start with a captial
When processing case patterns, identifiers starting with a capital are a constructor, while lower case ones are variables bindings
In other patterns (i.e., the irrefutable patterns in lambdas and pattern bindings) we keep the current status quo, where all names are variable bindings.

I think then we don't need any extra punctuation, and things should look fairly natural.

qsctr commented 7 months ago

IMO having case patterns and other patterns be treated differently could be confusing, and having a case restriction on enum constructor identifiers but nowhere else in the language seems inconsistent. Personally I would rather have the extra punctuation to make things more explicit.

An alternative to the : prefix that doesn't have a potential backwards incompatibility with type annotations is `, like polymorphic variants in OCaml, though personally I think : looks nicer :)

RyanGlScott commented 7 months ago

Another alternative, which I should mention for completeness's sake, is doing away with case patterns altogether and instead emitting eliminator functions alongside each enum declaration. For instance, writing this:

enum Maybe a = Nothing | Just a

Would also generate the following function alongside it (naming conventions up for grabs):

elim_Maybe : {a, r} r -> // The `Nothing` case
             (a -> r) -> // The `Just` case
             r           // The result

For instance, this case expression:

case e of
  :Just x -> x
  :Nothing -> True

Could be rewritten using elim_Maybe like so:

elim_Maybe True (\x -> x) e

This would dodge the question of needing additional syntax for case patterns entirely.

weaversa commented 7 months ago

I think the : notation will be disruptive to current Cryptol corpuses and the developer experience.

RyanGlScott commented 6 months ago

1602 implements non-recursive, nominal sum types (without the `:` prefix). As such, I think this issue has served its purpose—let's close it.

GaloisInc / cryptol

Sum types #1588

1602 implements non-recursive, nominal sum types (without the : prefix). As such, I think this issue has served its purpose—let's close it.

1602 implements non-recursive, nominal sum types (without the `:` prefix). As such, I think this issue has served its purpose—let's close it.