anoma / juvix

A language for intent-centric and declarative decentralised applications
https://docs.juvix.org
GNU General Public License v3.0
457 stars 53 forks source link

Better syntax for `trace` and the `seq` operator #1913

Closed lukaszcz closed 1 year ago

lukaszcz commented 1 year ago

In Haskell

trace "X" $ expr

behaves as expected, first printing "X" before evaluating expr. In an eager language expr is evaluated first, which is typically not what one wants with trace and it differs from the behaviour of trace "X" (expr). Plus it's very inconvenient to put expr in parens every time.

A solution is to add a lazy version of $ as a builtin (e.g. called $~ or $& or $$):

trace "X" $~ expr
lukaszcz commented 1 year ago

Or maybe $ should be lazy? Then f $ expr is just syntactic sugar for f (expr) and $ can only be used with two arguments (like trace etc)?

Then $ would preserve the laziness or eagerness of its left-hand side.

janmasrovira commented 1 year ago

if we make $ lazy, does that mean that it has to always be fully applied? If so, I think we should have the lazy variant as a separate function

lukaszcz commented 1 year ago

Probably we should disallow non-full application, because it would change the semantics.

Or we could have an eager variant $! as a separate function.

I think having eager $ is actually confusing, because you expect that f b is always equivalent to f $ b and with eager $ it isn't when f is lazy.

lukaszcz commented 1 year ago

Actually, trace X $ Y is not even valid, because trace needs to be applied with two arguments.

Perhaps the real issue is to have some special syntax for trace, e.g., trace X in Y, to be able to omit the parens around Y. Because you often want to put trace in front of some large block of code and adding and removing parentheses all the time is frustrating.

mariari commented 1 year ago

I'm not really sure of your semantics for trace (trace is normally used for tracing functions (trace function-name)).

But it seems you may want something like

trace print body = 
  unsafe.preform print
  body () 

trace f $ \ () -> body

Since I'm assuming you want a Haskell style do without forcing a do in your code.

This should just work, as the body is now a lambda, that gets applied to ().

Further this should also work

function =
  let _ = trace "hello this is f" in
  f 5

where you just bind _ to then ignore it, let ... in is let binding, but it's the only mechanism in your language besides do that can do this. Normally if we wish to waste computation we do a pattern like this. Or, do can be reworked, to not enforce IO. It has a general interface, but if nothing is in IO, then it simply doens't make the code in IO

$ should likely be strict, as application is strict. If you want laziness then add an explicit lambda to how things run.

lukaszcz commented 1 year ago

Yes, that's how we want to use trace (like in Haskell). But your proposal doesn't solve the problem, because our syntax for lambdas requires curly braces around them (\{x := body}).

It's just about syntactic convenience (as $ also largely is) -- you don't want to put the braces/parentheses every time you use trace.

Perhaps the solution with

let _ := trace X in ...

is most reasonable. It doesn't require new syntax, just making trace one-argument and returning unit or something. But then in-place trace is slightly less convenient: f (let _ := trace X in arg). I think we didn't consider this too much, just copied trace syntax from Haskell and made it lazy in the second argument.

lukaszcz commented 1 year ago

Well, let _ := trace X in Y is a bit wordy, so maybe a shorter special syntax trace X in Y actually makes sense.

mariari commented 1 year ago

I think adding special syntax for just trace is rather overkill. There seems to be a decent enough way to express I'd just accept the 6 character let _ :=.

Esp since trace in Haskell is mostly a hack to get around not lifting the code into IO but still trying to do some kind of effect.

So I'd wager code trying to use trace ought to be taxed a bit, and with let _ := trace ... in it's fairly easy to add or remove it, as it's just deleting the line where the trace is.

I'd also like to note that adding sugar for every minor inconvenience is not a very scalable solution, as most of the language becomes special rules one must remember due to the compiler developers use case.

lukaszcz commented 1 year ago

I'd also like to note that adding sugar for every minor inconvenience is not a very scalable solution, as most of the language becomes special rules one must remember due to the compiler developers use case.

Yes, I generally agree with this. Though trace is our main debugging tool right now, so some syntax sugar to make it convenient could be considered. Ultimately, we don't want any occurrences of trace left in the final programs. That's why we have a separate import for it -- by removing this import it can be assured that no trace is left.

lukaszcz commented 1 year ago

For us trace is a special builtin anyway, and it's strictly for debugging purposes.

mariari commented 1 year ago

Yes, I generally agree with this. Though trace is our main debugging tool right now, so some syntax sugar to make it convenient could be considered. Ultimately, we don't want any occurrences of trace left in the final programs. That's why we have a separate import for it -- by removing this import it can be assured that no trace is left.

Well if it's just for debugging purposes, then you should not add sugar for it. You are effectively taking away the word trace from any user, as trace is a special keyword. Esp for something that the compiler developer has tried to eliminate from code. So a word that doesn't show up anywhere in the final source code is now conflicting with a word the user may want to use for their application.

For us trace is a special builtin anyway, and it's strictly for debugging purposes.

I guess nothing like unsafe.perform-io exists? Special builtins shouldn't look unique. If an SML compiler makes map special for optimization purposes, the user should not know. What is builtin and what isn't shouldn't be obvious to tell, esp when the special form could just be a function from the user's perspective.

lukaszcz commented 1 year ago

Well if it's just for debugging purposes, then you should not add sugar for it. You are effectively taking away the word trace from any user, as trace is a special keyword. Esp for something that the compiler developer has tried to eliminate from code. So a word that doesn't show up anywhere in the final source code is now conflicting with a word the user may want to use for their application.

We're not taking it away right now, because it's specified when declaring the builtin how it should be named. The in is reserved already.

For us trace is a special builtin anyway, and it's strictly for debugging purposes.

I guess nothing like unsafe.perform-io exists?

It's called trace and can only print.

Special builtins shouldn't look unique. If an SML compiler makes map special for optimization purposes, the user should not know. What is builtin and what isn't shouldn't be obvious to tell, esp when the special form could just be a function from the user's perspective.

trace is not a function that can be implemented by the user -- that's why it's special.

This is a bit subjective discussion, and I'm usually inclined in these matters to go with convenience. Juvix is generally quite uniform, e.g., we don't have a special "if ... then ... else" syntax, only if is lazy and a builtin -- it can't really be implemented by the user but it looks like any other function. The same with trace but it's a bit less convenient to use right now. Making it one-argument and using it with the let _ := trace X in Y would make it more convenient. Having trace X in Y would be most convenient to type, at the cost of some uniformity.

mariari commented 1 year ago

We're not taking it away right now, because it's specified when declaring the builtin how it should be named. The in is reserved already.

How does parsing of the language work. because if the word is rebindable then you would need to parse and add that to the language before evaluating any line


declare %primitve.trace my-trace

f =
  my-trace "hi" in
  50

because when syntax like this is declared, one has to then worry about load order. I don't think this version of Juvix has a read time or a reader hash-table to effectively find this. Thus to properly support this, you'd have to make the declaration of %primitive.trace special and update the reader-table before parsing can continue.

Juvix is generally quite uniform,

I disagree with this, a lot of the syntax is rather adhoc and inconsistent with how other parts of the language come together.

For a bigger example, compare how normal definition syntax works with the agreed upon named-lambda syntax

f := go(acc1 := [];acc2 := 0)@\{
     []        := (acc1, acc2)
   | (x :: xs) := go (x :: acc1) (acc2 + 1) xs
  };

In this, the function application of how go is applied upon the arguments looks like an un-curried version. go (arg1, arg2) but other declarations in the language match application. I think some tension here is that what is wanted here is optional arguments, but instead a special form is had for lambdas only, meaning it sticks out from the rest of the syntax, and the argument slot for the lambda is moved from where the normal arguments are, which only adds tension. Adding optional syntax in the argumentation form, would solve this issue.

foo x y = append x y

I believe the uniform syntax you are talking about are items such as list syntax, tuple syntax, and record syntax. However, since this version of Juvix is an ML with ALGOL syntax, trying to fit into what others consider "practical", all these syntatical elements will eventually be added to the language, so I don't view the current lack of them as a design philosophy of uniformity.

mariari commented 1 year ago

I could maybe see the syntax working if in is an infix symbol with a certain precedence level. As then in would have a semantic meaning of ; in ALGOL languages. Though this would change a let form because

let foo = bar in let faz = tak in tak + bar

would be parsed into

(in (let foo = bar)
    (let faz = tak)
    (tak + bar))

let in this case would be an introduction into the current context for the new binding.

The major downside of this is that it would take some care to create a do syntax (or whatever alternative is chosen). Since if you are not careful it would play with in poorly.

Further another sore point is that in is not an infix work, and thus breaks the general idea of what an infix symbol is.

lukaszcz commented 1 year ago

We're not taking it away right now, because it's specified when declaring the builtin how it should be named. The in is reserved already.

How does parsing of the language work. because if the word is rebindable then you would need to parse and add that to the language before evaluating any line

Yes, implementing trace X in Y without making trace reserved might be challenge, and hence not worth it.

Juvix is generally quite uniform,

I disagree with this, a lot of the syntax is rather adhoc and inconsistent with how other parts of the language come together.

For a bigger example, compare how normal definition syntax works with the agreed upon named-lambda syntax

f := go(acc1 := [];acc2 := 0)@\{
     []        := (acc1, acc2)
   | (x :: xs) := go (x :: acc1) (acc2 + 1) xs
  };

In this, the function application of how go is applied upon the arguments looks like an un-curried version. go (arg1, arg2) but other declarations in the language match application. I think some tension here is that what is wanted here is optional arguments, but instead a special form is had for lambdas only, meaning it sticks out from the rest of the syntax, and the argument slot for the lambda is moved from where the normal arguments are, which only adds tension. Adding optional syntax in the argumentation form, would solve this issue.

Yes, that's actually true that the extra arguments look a bit as if they were uncurried. Maybe

go(acc1 := [])(acc2 := 0)@\{...}

would be better? I don't know.

But this doesn't seem to have anything to do with optional arguments.

We also have this small inconsistency between the use of ; and | to separate pattern clauses (; in top-level functions, | in lambdas and cases).

foo x y = append x y

I believe the uniform syntax you are talking about are items such as list syntax, tuple syntax, and record syntax. However, since this version of Juvix is an ML with ALGOL syntax, trying to fit into what others consider "practical", all these syntatical elements will eventually be added to the language, so I don't view the current lack of them as a design philosophy of uniformity.

I believe that I'm talking about relatively limited syntactic sugar, e.g., lack of an if statement, or list comprehensions, or special syntax for debugging assertions (like in OCaml). Relatively limited in comparison to Java or Haskell, but not Lisp.

mariari commented 1 year ago

But this doesn't seem to have anything to do with optional arguments.

Because if lambdas had optional arguments then, one more aspect of syntatic uniqueness of recursive-lambda would go away

f (x := []) (y := 0) z a = z + a :: x
f := go @\{
  (acc1 := []) (acc2 := 0) xs := (acc1 : xs, acc2)
}

Now the recursive lambda form is just an argument before @\{ which is still special, but the optional aspect is kept in a more general idea, thus making the idea less special.

I believe that I'm talking about relatively limited syntactic sugar, e.g., lack of an if statement, or list comprehensions, or special syntax for debugging assertions (like in OCaml). Relatively limited in comparison to Java or Haskell, but not Lisp.

Are you talking about #trace in the OCaml top level. the OCaml toplevel is a mess and is essentially a second language.

However when I say the syntax is not too uniform, I am comparing it to other ML's and even some ALGOL languages. I agree that it is currently less special cased than the mature ML's (OCaml, SML, etc), but I'd put it on par or heavier with the less mature ML's (ELM) This is I believe due to two points.

  1. ML's have a fleshed out module system or a more fleshed out Typeclass system.
  2. Those ML's have had real world use so some extensions are had to make them more malleable

For 1, a lot of syntatic overhead comes in, in that ML syntax is very brittle and non extensible, so an entire second language is had for modules. Type classes come up with similar issues. At first they are added rather transparrently, but I believe due to the strain n of how instances are clarified does lead for the need for sugar to make instance resolution easier. This is seen in language extensions in Haskell, and I believe in special syntax in Scala. For this, I believe typeclasses are slightly less intrusive mechanism when handled as MLs handle them, although I believe it eventually ends in adding unintended sugar later in life (though this is not in general, as modules can be done lighter in general, and unify much of the language design and syntax).

For 2, this includes things that are already mentioned, and some things like limited macro support. Overall the more mature ML's have to handle convince features that end up growing with the language.

For more light weight ML's I'd point to the more recent ELM https://elm-lang.org/docs/syntax . I believe ELM is slightly less special cased than Juvix's currently though a case can be made for the indent sensitive syntax (though ; heavy syntax is currently very out of fashion, with many languages making strives to make the ; silent as people view it as that much of a nuisance).

However I'd argue that ML syntax in general is quite complex namely due to how they handle

  1. type specification
  2. matching syntax
  3. signatures
  4. module and or typeclass system

Various ALGOL likes can be light weight. While I think go is the opposite of minimal, their syntax is claimed to be fairly minimal and I'd put it in the same class as Juvix. There are other ALGOL likes I have yet to properly analyze like lua, C, and R which seem like they can be on a similar """small""" amount of sugar but are accepted in mainstream conversation.

However if we are allowed to talk about other families, then I'd argue in comparison the syntax of the ML and ALGOL familiy is heavier than these families:

  1. Erlang is fairly minimal, with Elixir being an ALGOL that mostly uses macros to get it's way.
  2. APL is minimal syntax wise.
  3. Smalltalk, has no macros, but is very minimal syntax wise.
  4. Lisp - even the syntax are reader macros, not built in
  5. Forths - Even more minimal than lisps in some cases.
  6. Wolfram Alpha - Unknown, but I suspect is rather minimal

However this is all off topic.

It seems that it is agreed that trace foo in syntax is too heavy due to the implications.

lukaszcz commented 1 year ago

Because if lambdas had optional arguments then, one more aspect of syntatic uniqueness of recursive-lambda would go away

f (x := []) (y := 0) z a = z + a :: x
f := go @\{
  (acc1 := []) (acc2 := 0) xs := (acc1 : xs, acc2)
}

Now the recursive lambda form is just an argument before @\{ which is still special, but the optional aspect is kept in a more general idea, thus making the idea less special.

This is actually a good point. I didn't think of these as "optional" arguments, but "extra" arguments that are always assigned an initial value. But indeed, one could see them as optional and that would be more general.

Are you talking about #trace in the OCaml top level. the OCaml toplevel is a mess and is essentially a second language.

OCaml has a special "assert" statement that is a part of the language proper.

It seems that it is agreed that trace foo in syntax is too heavy due to the implications.

Yes, I would agree now.

lukaszcz commented 1 year ago

This is actually a good point. I didn't think of these as "optional" arguments, but "extra" arguments that are always assigned an initial value. But indeed, one could see them as optional and that would be more general.

But the problem with optional arguments is that they don't play well with function definitions by pattern-matching equations. One can use such equations also in lambdas, and then one would need to repeat the optional arguments (and their default values?) with each equation, or invent some other way of specifying them, e.g., specify them only in the first equation and use wildcards in their place in further equations.

lukaszcz commented 1 year ago

We should do the following: make trace a 1-argument function and add a seq builtin which is guaranteed to evaluate its two arguments in the order from left to right and return the second. We can think of adding some infix operator for it. The order of argument evaluation is not specified by our semantics (it actually differs between JuvixCore evaluator and JuvixAsm), so having a seq primitive is useful more generally.

lukaszcz commented 1 year ago

A natural name for infix seq is ;, but we're already using it as a general syntactic separator. I propose >>> instead.

lukaszcz commented 1 year ago

A let is not good because it is optimised away if the result is not referenced (actually, we do this early in the pipeline).