User-defined scope syntax change proposal

asoffer commented 2 years ago

Problems to solve

There are a few things I want to improve about the syntax for user-defined scopes:

Scope blocks cannot be anonymous. This means that we need to explicitly name a then block for if statements. This is a significant departure from other curly-brace languages and a rather cumbersome annoyance.
There are a lot of keywords reserved in the current incarnation.
To support conditional jumps we have two forms of goto, basically just so we can implement if outside the language. This doesn't seem to pull it's weight.

Proposal

There are a bunch of changes here. I'll describe each briefly and then give an example what a some scope definitions would look like.

A context type

We'll add a primitive type (tentatively) named scope_context. Any time a new user-defined scope is instantiated, a compile-time instance of this type is created and passed to the scope literal. This is the mechanism by which a user-defined scope can construct itself with knowledge about what blocks are present (e.g. does the if-statement have an else block?).

Accessing block names

Block names can be accessed in the context as if they were fields on a struct (i.e, context.block_name). There are also several special block names on the context: start means the entry to the scope. exit means leaving the scope. This means that blocks on user defined scopes cannot have these names (though they're only reserved in this much more minimal context). We will also provide a special access, tentatively context._ to mean "the first block", allowing that block to be anonymous.

One other nicety is this provides room for us to create syntax for accessing blocks other than by-name. This is necessary if we want user-defined scopes to create pipelines, where we may use a block with the same name multiple times.

Replacing `before` and `after` with operators

Rather than before and after keywords, we'll use >> and << respectively. The idea here is that when a user yields data from a block, they spell that with <<, which passes the data to the "after" handler, now spelled identically. There is also today a "quick exit" which allows users to do cleanup work if they exit a scope and jump over it (e.g., by returning from inside nested scopes). We'll replace this with destroy, which is the keyword we use for struct destructors. This makes sense, because it is actually destroying the state associated with a scope.

A short-jump literal syntax

Just like functions have a short syntax using =>, jumps can have the same. We propose the nearly identical syntax: jump() => <expr>. There's a nice symmetry here. Just like function literals, you can omit the return when you use this shorthand, with jumps you can omit the goto keyword. In each case, we are omitting the keyword that indicates how you leave the current context.

Remove conditional-goto.

Lastly, rather than having conditional gotos, we propose one of two options:

Bake if into the language. It's a clearer primitive with better syntax than we could create for conditional gotos.
Allow the blocks jumped to in a goto to be more complex expressions. This would allow us to use a boolean to index into an array. We'll show this example below. One caveat with it is that, while simple and powerful, it creates some difficult-to-answer questions about the type of the array elements and the lifetimes of the expressions referenced in them.

What it looks like with this proposal:

Contrast this with what you see for if and while here

if ::= scope {
  (>>) ::= jump(ctx :: scope_context, b: bool) {
    goto [ctx.else(), ctx._()][b as i8]
  }

  -- ::= block {
    (<<) ::= jump(ctx :: scope_context) => ctx.exit()
  }

  else ::= block {
    (<<) ::= jump(ctx :: scope_context) => ctx.exit()
  }
}

while ::= scope {
  (>>) ::= jump(ctx :: scope_context, b: bool) {
    if (b) then { goto ctx._() } else { goto ctx.exit() }
  }

  -- ::= block {
    (<<) ::= jump(ctx :: scope_context) => ctx.start()
  }
}

perimosocordiae commented 2 years ago

+1 on the context type and block name access via the context.

-1 on the <</>> operators; I think on_entry/on_exit are more clear, and the names need only be reserved inside the scope definition block.

+1 for the short jump syntax. Also having jump take an explicit ctx parameter is nice, it improves readability IMO.

+0 for removing the conditional goto. I agree that it needs a reconsider, but I don't know that I'm sold on either of your options.

wrhall commented 2 years ago

Mostly feel good about this

Disagree with CJ about <<,>>

I can't help but wonder how common it is to jump somewhere based on a condition, and if so is there an interesting way to change that??

while ::= scope {
  // Original
  (>>) ::= jump(ctx :: scope_context, b: bool) {
    if (b) then { goto ctx._() } else { goto ctx.exit() }
  }

  // Or
  (>>) ::= jump(ctx :: scope_context, b: bool) {
    goto ctx._(b)
  }

  // Rust?
  (>>) ::= jump(ctx :: scope_context, b: bool) {
    match (b) {
      true => goto ctx._(),
      _ => return
    }
  }

  -- ::= block {
    (<<) ::= jump(ctx :: scope_context) => ctx.start()
  }
}

I don't like any of these, but that includes original

wrhall commented 2 years ago

Can ctx have its type inferred?

asoffer commented 2 years ago

Can ctx have its type inferred?

Ooh, I don't know what style guidance would come out of this, but yes. Either of these are currently considered valid.

(>>) ::= jump(ctx :: ~`T, b: bool) { ... }
(>>) ::= jump(ctx :: $ctx, b: bool) { ... }

perimosocordiae commented 2 years ago

Allow me to elaborate my -1 on anything magical regarding scope definition. I don't expect that scopes will make up the majority of Icarus user lines of code, so it's not worth being overly terse. I'm fact, they're likely to be the most unusual language feature for readers coming from other languages, so it's important that their syntax is readable (and searchable).

asoffer commented 2 years ago

Searchable doesn't have to mean grep-able. I really hope it doesn't, but perhaps I'm being optimistic.

I might actually argue that (<<) is less magical than the keyword, because it's called in response to << being present in user code.

We could also potentially change the goto syntax to use >> explicitly.

  // For loop's primary block
  (>>) ::= jump[state: *State](ctx :: $ctx) {
    if (state._current == state._end) {
      ctx.exit >>
    } else {
      state._current += 1
      ctx._ >> state._current
    }
  }

It makes sense to use >>, but it is strange that it points away from the block. Orienting them the other way also feels strange.

asoffer commented 2 years ago

@wrhall, I almost feel like match is the sort of thing I want to be definable in the language. I'm open to baking it in, but I think it has to work neatly with user-defined types as well as primitives. If you have a proposal for that I will entertain it very seriously.

asoffer commented 2 years ago

Here is an entirely different proposal, much of the inspiration for which comes from Ruby.

The benefits of this approach:

We can eliminate the need for goto, jump, and block as keywords.
We avoid the need to answer questions about >>/<< versus on_entry/on_exit.
There is no need to specially define a scope or a block's state.
Significant reduction in the complexity and amount of code needed to define a scope.

Proposal

Define a new scope with the following syntax:

scope [<context-id>] (<parameters>) {
  <body>
}

The <context-id> is an identifier that will be used for the compile-time object of type scope_context as in the previous proposal. There is no need to specify the type because it is required to be scope_context. This object holds information about which blocks are provided in the scope instantiation.

As an example, a for-loop could be implemented as:

for ::= scope [context] (begin: i64, end: i64) {
  i := begin
  while (i != end) {
    context.do << i
    i += 1
  }
}

Note how this combines all of the code pertaining to jumps, initialization, and before/after block handling into a single place. Comparing these 7 lines with the existing for loop definition (22 lines).

This approach has two known problems:

There is no way to re-evaluate the scopes arguments as is done with a while loop's condition. Whereas previously we provided explicit syntax for jumping to the beginning of a scope and re-evaluating arguments, we do not have such a syntax proposed here yet. Such syntax is subtle because we do not have an explicit place where scope state is defined/initialized before evaluation of arguments. For example, a scope that repeatedly reevaluated it's arguments and counted how many times it evaluated the value before exiting would not be possible.
Scopes can be left from inside a block, In the for-loop example above, note that the line context.do << i which evaluates the user-provided block may return from the enclosing function, thus never completing evaluation of the scope. Handing this requires running destructors for every variable in all scopes towards the return statement. The semantics here are clear, but it may not be evident from the scope definition itself that this could happen. In essence, this is the same problem as with exceptions, where we can no longer rely on straight-line code executing in straight-line order. It should not be surprising that this problem comes up; scopes, coroutines, and exceptions are all interrelated ideas.

wrhall commented 2 years ago

For the first issue can you pass a function that gets evaluated? Most scopes don't need this, but instead of passing in the result of Len(arr) you would pass in a lambda to evaluate it (or just the array directly)

wrhall commented 2 years ago

Can you give an example of client code? Eg that uses the scope? I'm not sure I'm following the second issue

asoffer commented 2 years ago

For the first issue, yes, but I wouldn't want to demand you do that for a while loop. We can (and probably will) bake a while-loop into the language, but asking all other scopes to have users pass in a lambda is not ideal.

User code could be:

for (0, 10) do [n: i64] {
  if (n == 5) then { return }
}

For the author of the scope, they need to remember that context.do << i may never return to that point and may never execute any code in the scope afterwards (kind of like an exception is thrown.

Which leads to problems such as below.

with ::= scope [context] (filename: []char) {
  handle := fopen(filename)
  if (errno == SUCCESS then {
    context.open << handle
    fclose(handle)  // This handle could be leaked because a return or break from the user's `open` block would bypass this line.
  } else {
    context.error << errno
  }
}

The correct solution would involve something like defer, which is maybe a best practice anyway for functions and scopes? I think this is reasonable and worthwhile still.

Another concern would be if context.open << handle was hidden inside a function call. I think this has to be outright banned; we only allow that syntax directly in the body of a scope. Otherwise we have really recreated exceptions.

asoffer commented 2 years ago

Another question to be answered: How do we determine the list of possible block? The proposal above would simply ignore any block whose name is never referenced. There are situations where that might be intentional: We could define a comment block that simply skips over the code. However we still want to catch cases such as if (condition) then {} eles {} where a typo causes a block to be ignored.

wrhall commented 2 years ago

So I'm not saying all scopes need a lambda, just if you want the result to change

Instead you could pass a ptr? Kind of like mut captures in cpp

How does all of this syntax compare to co routines in cpp? I know you've noted that these things are interrelated, but the syntax is starting to feel familiar?

I don't hate making scope authors use defers etc because it's fine to require library authors be a bit sophisticated

Wrt comment block: It feels like a bad default to ignore when block lookup fails. Why would you not actually force a user to define a comment block? Like in that case, the only allowed blocks are the ones defined/imported... A block that is used but not defined is an error, and unused blocks are lint errors

asoffer commented 2 years ago

o I'm not saying all scopes need a lambda, just if you want the result to change

If while is a library, I wouldn't want users to have to wrap their condition in a lambda. Maybe while could be built-in, but this seems like a useful feature for scope authors as well.

Instead you could pass a ptr? Kind of like mut captures in cpp

This allows the value to change, but doesn't recompute the value on each iteration. Side effects of the computation will only be executed once.

How does all of this syntax compare to co routines in cpp? I know you've noted that these things are interrelated, but the syntax is starting to feel familiar?

At the usage site, it's quite similar, and has been for a while. C++ coroutines don't have a clear analogue to defining a scope. It's honestly closer to something like std::foreach or std::transform_reduce in which callables are passed to a function template. In our case, those callables are passed as part of a scope-context object at compile-time and are given names (the names of the blocks).

I don't hate making scope authors use defers etc because it's fine to require library authors be a bit sophisticated

I don't hate it either, but I want to aim for everyday programmers writing scopes when it makes sense. My hope is that scopes become just as commonplace as types. In C++ we often suggest creating one-off types, and I think the same could make sense for scopes; The idea is that you would define a one-off scope that explains "here's the high-level control flow of what I'm doing," separating that from the specific data processing, in much the same way that you might define a type to separate the high-level invariants you're relying upon from the specifics of the data.

Wrt comment block: It feels like a bad default to ignore when block lookup fails. Why would you not actually force a user to define a comment block? Like in that case, the only allowed blocks are the ones defined/imported... A block that is used but not defined is an error, and unused blocks are lint errors

Very much agreed, but the question is how you would determine that a block is not just unused but will never be used. For example, currently there is nothing to stop someone from writing

if (condition) then {
  // do stuff
} else {
  // do something else
} and_always {
  // do this no matter what
}

But the if-scope doesn't define and_always, so this block would be silently ignored. We would want the definition of if to validate that:

Only then and else are present
then must be present (else is optional)
then precedes else if both exist.

A potential option here is to just let users define their own validation function in the body of a scope which does nothing but could static-assert if the scope context doesn't match what's expected. But this might be easy to forget to write.

wrhall commented 2 years ago

Sorry to keep asking for examples, but I still don't feel like I have a great intuition. How do I define cascading scopes? if, else if, and else for example (I'm having a lot of difficulty thinking about how to define if!! You can use logic short-circuiting, but I guess that depends on what already exists in the language. What tools can I use to define an if statement?)

Or similarly, elixir has cond do and case _ do. How would I define one of those?

I want to say cond do would take an array of conditions? or have a bunch of child scopes? But I'm not sure how the definer of the scope accesses those

asoffer commented 2 years ago

I'm not sure if/else would be definable in the language anymore. Fundamentally you'd need a branching mechanism. In prior incarnations, it made sense to have one, but with this approach you only need one if you want to define if, so there doesn't seem to be much value in having it be a library feature.

else if is not part of a scope. It's an if-statement as a single statement in an else-block of a prior if-statement.

Pattern matching as part of case _ do requires language support as well. We would need runtime support for patterns to be first-class. Currently we have them only at compile-time and they're not first-class.

wrhall commented 2 years ago

Sure, it makes sense that you could only define the scope if / else if you already had a ternary operator

~Given that, I'm still not clear how you would define a chained set of scopes... if/else, for/else or other~

In addition, if I wanted to define a filter that can be combined with map or other higher order functions, how does that look?

filter ::= scope [context] (arr: []int) { // would define for any enumerable on generic type
  // Scope is of type f: int -> bool ?
  each arr do [elt] {
    res := context.do << elt
    if (res) {
      << elt
    }
  }
}

If I do that, can I now combine map and filter via

map(arr) do [elt] {
  << elt*elt
} filter {
  // ...
}

wrhall commented 2 years ago

~I'll edit for correctness in a bit ...~

~Ok, the 3 lines that seem obviously wrong~:

(edit: I've updated the above examples to use the appropriate << syntax)

The each scope -- defined below, might be close to correct now
~elt >> yield what we actually want here is an output var (an array) where we're appending to that array.~
~yield elt*elt I actually don't mind this line, but it's not clear to me that this syntax is supported today.~

wrhall commented 2 years ago

Ohhh, wait, I think I understand. I'm going to write some more examples in this thread, which will feel off-topic but will help me personally.

wrhall commented 2 years ago

A sample forEach scope

eachCtx ::= Interface : Context {
  do : Scope.Block
}

each ::= scope [context: eachCtx] (arr: []i64) {
  for (0, arr.length()) do [n: i64] {
    context.do << n
  }
}

// Usage
f : (i64 -> void)

each arr do [n: i64] {
  f(n)
}

wrhall commented 2 years ago

I suspect in practice you would have some common Contexts, e.g. one with a do block pre-defined that most simple scopes would just use.

wrhall commented 2 years ago

Ok, so I've attempted to resolve your question above, about "how do you allow a comment block but prevent typos?" You could define an interface that inherits from Context. On this interface, you would essentially enumerate all allowed blocks. I've made up some syntax surrounding this and generally don't love it, but I think the idea is reasonable.

I also now have a fairly good understanding of what multiple blocks looks like, and I assume that the do block will just be invoked by the scope as many times as necessary -- in particular, you simply define the interface to have multiple block fields & then pass control+input to them (if they take input?). I'm not completely clear on the mechanism for a block receiving input (and I don't know if this has been solved generally, given your comment above about if else if being nested if in the else block)

asoffer commented 2 years ago

I really like the idea of having the type of the context describe what's allowed. In retrospect that seems like an obvious choice (that's what types are).

Blocks receive data from the scope via my_block << my_data. The data is passed back to the scope with << in the block body.

wrhall commented 2 years ago

Cool, that makes sense. How do you get data back out of the block? e.g.

result << my_block << my_data

wrhall commented 2 years ago

A couple more somewhat silly examples:

// Not sure how I feel about type-name capitalization. Not great. Fine, probably.
IfContext ::= Interface(Context) {
  do ::= Scope.Block
  else ::= Scope.Block
}

// To define an `if` scope, all you need is a ternary in the language. Maybe? Does this make any sense?
if ::= scope [context: IfContext] (cond: bool) {
  // the fun thing about a do block or an else block is that it doesn't actually get any input. This is Rust-style void blocks.
  cond ? context.do << () : context.else << ()
}

forElseContext ::= IterContext + ElseContext {
  // IterContext comes with a do defined
  // ElseContext comes with an else defined. This could be a one-liner but for the comments.
}

forE ::= scope [context: forElseContext] (arr :: []T) {
  each arr do [elt :: T] {
    context.do << elt
  }
  context.else << ()
}

arr ::= %{1, 2, 3}
forE arr do [elt : i64] {
  if (elt == 2) {
    // found it!
    do_something(elt)
    break  // this means else doesn't execute
  }
} else {
  did_not_find_two()  // this will never execute, given the above code
}

Thoughts? The concept of a for/else loop is in Python, and the example came from there: https://book.pythontips.com/en/latest/for_-_else.html

wrhall commented 2 years ago

ok, I still haven't reasoned too deeply about the short jump syntax or when/how you would use it. My basic assumption here is that this is for dealing with deeply-nested scopes ... how do you get where you're going when you're in a deep nest?

That said, it's actually not clear to me how one would use that. Questions:

How do you name a location as a client?
As a scope designer, how do you opt into or out of certain keywords, e.g. "break" or "continue"? E.g. if should raise "breaks" and "continues", but scopes that use IterContext probably want to catch those & break. I assume this could be set on the Context, and then you'd get a bit of stuff for free (e.g. people using IterContext to define their loops would get all of the best practices modulo backwards-compatibility requirements 😄 )
What is the precise syntax for the before operator? What are examples when you would want to use it?

asoffer commented 2 years ago

I'll share what we had previously, but I'm very open to changes/extensions. Previously, there was one way to send data back to the scope handler, which was << my_data. However that only sends data out one level. If you want to send data out of a deeply nested scope, you need to label the scope you're sending data to and use that label as the left-operand of the <<. For example,

// Apply one step of the Collatz iterative process to each element in the
// array, storing the result in transformed_array
transformed_array := #.a map (my_array) each [element: i64] {
  if (element % 2 == 0) {
    #.a << element / 2
  } else {
    #.a << 3 * element + 1
  } 
}

The label approach allows you to send data to any scope containing you. We could even implement return as a special label #.return.

One reason we don't have break and continue is that they're hard to opt into according to the standard semantics. Each of them jumps passed as many if statements as is necessary to get to a loop, which makes them integrally tied to the builtin scopes. Specifying things this way is now quadratic (in the number of scopes participating).

However, this doesn't address returning to the current scope but possibly with a different direction (e.g., passing over the rest of the body a la continue). For that I could imagine using << with an enum. Something like << for.CONTINUE.

All that said, let me address your questions directly:

How do you name a location as a client?

With labels, syntactically they are (currently) a #. followed by an identifier and satisfy the normal scoping rules. You cannot jump out of a function.

how do you opt into or out of certain keywords, e.g. "break" or "continue"?

I think you use << with a label.

What is the precise syntax for the before operator? What are examples when you would want to use it?

I think we don't need it anymore as it used to be specified. I'm actually unclear if it was ever necessary. The idea was that once you decide what data gets passed into a scope you may want to further transform it. For example, you might have an unwrap scope which checks that the optional value is present, and passes the value itself, rather than the wrapper. The "before" operator was responsible for the unwrapping. But with the new proposal, you can simply do the unwrapping in the scope itself.

asoffer commented 2 years ago

The interface context needs to do a bit more to be fully general:

Scopes should be allowed to have their first block anonymous, so we can say if (cond) { ... } rather than if (cond) then { ... }.
Scopes sometimes have ordering constraints on the elements, but not always.
Scopes might have type constraints on the data that gets passed into a block. For example, the then-block in an if statement takes no arguments, but the do block on a for-loop takes the iterated over element as an argument.
Scopes can repeat blocks sometimes. In a sequence of map/filter/reduces, we may want to repeat stages. We need to be able to reuse the names.

One approach is to just have a type for the scope context that holds an array of blocks as specified and write a validation function for them. This allows scope-authors to define them programmatically, which is nice, but it doesn't give us a way to map that onto the definition of the scope itself (to verify that the definition is valid for any scope-context passing the validation function). This is the same problem C++ concepts have where the concept can validate instantiations, but not the template it's guarding without instantiations. If we want to solve this problem, we need a tiny DSL, much like what you've described above with the Interface. It just needs to be rich enough to specify the things I mentioned above.

asoffer commented 2 years ago

Closing this since the primary ideas have been implemented. We have not implemented any mechanism by which to specify the valid block names on a scope context, but we'll go with something like Will's suggestion, and fleshing out the details should be done in it's own issue

asoffer / Icarus