SMLFamily / Successor-ML

A version of the 1997 SML definition with corrections and some proposed Successor ML features added.
191 stars 10 forks source link

rec syntax #13

Open RobertHarper opened 8 years ago

RobertHarper commented 8 years ago

I like the improvement to the rec syntax, which was wildly out of control in Standard ML. However, let me lay down a marker saying that it may have to be revised again depending on what we do about my suggestions regarding "val", "def", "need", and "fun" declarations. Where "rec" is permitted should be tied to the semantics of the thing. Functions can be recursive, need cells can be recursive, but can general values?

eduardoleon commented 8 years ago

but, can general values?

I don't think so. The ability to define let rec xs = 1 :: xs in OCaml annoys me to no end. The guarantee that strict lists, trees, etc. are finite is very important to convince myself that recursive operations on them will terminate.

eduardoleon commented 5 years ago

Out of curiosity, what is a need cell? Neither Google nor Google Scholar is being particularly helpful. Maybe it is related to the following use case.

I occasionally want the ability to initialize a reference cell so that it points to itself. For example, I think the most convenient implementation of the union-find data structure is to make non-unified cells point to themselves:

datatype 'a state = Cell of 'a cell | Value of 'a
withtype 'a cell = 'a state ref

fun fresh () = let val rec r = ref (Cell r) in r end

But this is not allowed by Standard ML today. And, as far as I can tell, this cannot work as long as ref is an ordinary function.

MatthewFluet commented 5 years ago

I think it was described in #12 as a memoizing (i.e., call-by-need) construct.

RobertHarper commented 5 years ago

You want to use explicit backpatching, the reference cell is there for that purpose. The val rec bindings in SML are misleading, because the only recursive things are functions, whereas in Haskell it "works" because every damn thing is a reference cell, and it does the backpatching for you. Simply allocate the cell in Cell state, and assign that cell to its contents.

Bob Harper

(c) Robert Harper. All rights reserved.

On Wed, Jan 2, 2019 at 3:09 PM eduardoleon notifications@github.com wrote:

Out of curiosity, what is a need cell? Neither Google nor Google Scholar is being particularly helpful. Maybe it is related to the following use case.

I occasionally want the ability to initialize a reference cell so that it points to itself. For example, I think the most convenient implementation of the union-find data structure is to make non-unified cells point to themselves:

datatype 'a state = Cell of 'a cell | Value of 'awithtype 'a cell = 'a state ref fun fresh () = let val rec r = ref (Cell r) in r end

But this is not allowed by Standard ML today. And, as far as I can tell, this cannot work as long as ref is an ordinary function.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/13#issuecomment-450970355, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdsdQrhwk075inAxZe5LPSfmflkDLjGks5u_RGQgaJpZM4H_WGo .

eduardoleon commented 5 years ago

The problem is that there is no good value to initialize the reference cell with:

fun fresh () =
  let val r = ref (Cell ???)
  in r := Cell r; r end

In my admittedly limited understanding, the entire point to disallowing definitions like

val rec xs = 1 :: xs

is to make structural induction a valid reasoning principle for structurally recursive functions. However, even in today's Standard ML, this already fails when the datatype recursion goes through reference cells, because mutation allows us to “tie the knot”:

infixr 5 :::
datatype 'a seq = NIL | ::: of 'a * 'a seq ref

fun loop NIL = ()
  | loop (x ::: xs) = (print x; loop (!xs))

let
  val knot = ref NIL
  val seq = "hello" ::: knot
in
  knot := seq;
  loop seq
end

Hence, as far as I can tell, we do not lose anything by allowing cells to be initialized with references to themselves.

RobertHarper commented 5 years ago

The fact is in such situations you have to have a "null" value that is only ever used to initialize a cell, and is then immediately obliterated on backpatching. See PFPL for a discussion. There's no way around this.

(c) Robert Harper. All rights reserved.

On Wed, Jan 2, 2019 at 4:14 PM eduardoleon notifications@github.com wrote:

The problem is that there is no good value to initialize the reference cell with:

fun fresh () =

let val r = ref (Cell ???)

in r := Cell r; r end

In my admittedly limited understanding, the entire point to disallowing definitions like

val rec xs = 1 :: xs

is to make structural induction a valid reasoning principle for structurally recursive functions. However, even in today's Standard ML, this already fails when the datatype recursion goes through reference cells, because mutation allows us to “tie the knot”:

infixr 5 ::: datatype 'a seq = NIL | ::: of 'a * 'a seq ref

fun loop NIL = ()

| loop (x ::: xs) = (print x; loop (!xs))

let

val knot = ref NIL

val seq = "hello" ::: knot in

knot := seq;

loop seq end

Hence, as far as I can tell, we do not lose anything by allowing cells to be initialized with references to themselves.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/13#issuecomment-450987021, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdsdfaVYZ8j-xAs_qH6D3_ZHoWTQkfIks5u_SErgaJpZM4H_WGo .

eduardoleon commented 5 years ago

I'm not seeing why it has to be this way. OCaml is by no means a lazy language, yet it allows

type 'a state = Cell of 'a cell | Value of 'a
and 'a cell = 'a state ref

let rec r = ref (Cell r)

Of course, unfortunately it also allows

let rec xs = 1 :: xs

But I'm not seeing why we couldn't aim to allow the former while still forbidding the latter.

Adding a “null” value to the state type is inconvenient. One (crude, but mostly automatic, hence very useful) way in which I check that my code covers all possible cases is to

These things fly out of the window if we add spurious cases to datatypes.

RobertHarper commented 5 years ago

ocaml screws this all up, but it has an implicit null value. it doesn't make sense in a by-value language to admit general recursion, but they try.

(c) Robert Harper. All rights reserved.

On Wed, Jan 2, 2019 at 4:51 PM eduardoleon notifications@github.com wrote:

I'm not seeing why it has to be this way. OCaml is by no means a lazy language, yet it allows

type 'a state = Cell of 'a cell | Value of 'a and 'a cell = 'a state ref

let rec r = ref (Cell r)

Of course, unfortunately it also allows

let rec xs = 1 :: xs

But I'm not seeing why we couldn't aim to allow the former while still forbidding the latter.

Adding a “null” value to the state type is inconvenient. One (crude, but mostly automatic, hence very useful) way in which I check that my code covers all possible cases is to

  • Ensure all matches are exhaustive.
  • Ensure all control flow paths are reachable.
  • Ensure no exceptions are raised outside of code that does I/O.

These things fly out of the window if we add spurious cases to datatypes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/13#issuecomment-450996485, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdsdf73uome2AfH3EHEGObX79U89dnvks5u_Sn1gaJpZM4H_WGo .

RobertHarper commented 5 years ago

there's no way to avoid it afaik. ocaml puts it under the hood; you can get a run-time error amounting a match exception if the recursive definition requires the value fo the recursive definition. sml simply exposes what is true, and puts it on you to manage that, rather than pretending it's otherwise and hiding the specifics from you. i much prefer the "honest" route.

bob

(c) Robert Harper. All rights reserved.

On Wed, Jan 2, 2019 at 4:55 PM Robert Harper rwh@andrew.cmu.edu wrote:

ocaml screws this all up, but it has an implicit null value. it doesn't make sense in a by-value language to admit general recursion, but they try.

(c) Robert Harper. All rights reserved.

On Wed, Jan 2, 2019 at 4:51 PM eduardoleon notifications@github.com wrote:

I'm not seeing why it has to be this way. OCaml is by no means a lazy language, yet it allows

type 'a state = Cell of 'a cell | Value of 'a and 'a cell = 'a state ref

let rec r = ref (Cell r)

Of course, unfortunately it also allows

let rec xs = 1 :: xs

But I'm not seeing why we couldn't aim to allow the former while still forbidding the latter.

Adding a “null” value to the state type is inconvenient. One (crude, but mostly automatic, hence very useful) way in which I check that my code covers all possible cases is to

  • Ensure all matches are exhaustive.
  • Ensure all control flow paths are reachable.
  • Ensure no exceptions are raised outside of code that does I/O.

These things fly out of the window if we add spurious cases to datatypes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/13#issuecomment-450996485, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdsdf73uome2AfH3EHEGObX79U89dnvks5u_Sn1gaJpZM4H_WGo .

yallop commented 5 years ago

you can get a run-time error amounting a match exception if the recursive definition requires the value fo the recursive definition.

That's true for lazy values in OCaml (and for recursive modules, but that's a different story).

However, for OCaml definitions that don't involve laziness the safety of recursive definitions is statically checked by what amounts to a type system, and definitions that pass the check never raise exceptions at runtime.

eduardoleon commented 5 years ago

I would like to add to @yallop's comment that this section of the OCaml manual explains how a recursively defined variable may appear in the right-hand side of its own definition. I found this via this Stack Overflow question, by the way.

RobertHarper commented 5 years ago

Thanks! I'm not up to date, thanks for the summary. I wonder how strict the restrictions are? And, if backpatching is used as I think it must, then why introduce this overhead just for the sake of admitting more general recursion? It would seem to me that in a cbv language you want "recursive things" such as recursive functions or recursive suspensions or recursive tuples-of-functions (sometimes called objects).

Bob

(c) Robert Harper. All rights reserved.

On Fri, Jan 4, 2019 at 6:12 PM yallop notifications@github.com wrote:

you can get a run-time error amounting a match exception if the recursive definition requires the value fo the recursive definition.

That's true for lazy values in OCaml (and for recursive modules, but that's a different story).

However, for OCaml definitions that don't involve laziness the safety of recursive definitions is statically checked by what amounts to a type system, and definitions that pass the check never raise exceptions at runtime.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SMLFamily/Successor-ML/issues/13#issuecomment-451596705, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdsddaRyqMM0GKv8WTlGNPeK4MZ7d9vks5u_9_QgaJpZM4H_WGo .

yallop commented 5 years ago

You're welcome! OCaml does indeed use a kind of backpatching, but it's a bit more efficient than the naive approach, which involves an extra indirection whenever a recursively-defined value is accessed. The basic idea is to allocate a dummy block for each recursively-defined value, then overwrite the blocks directly during initialization. (The details are in "Compilation of extended recursion in call-by-value functional languages by Hirschowitz, Leroy & Wells.)

There are two restrictions:

  1. the size of each value computed by a rhs must be statically known. So, e.g., let rec x = (e1, e2) is acceptable because it allocates a block of statically-known size 2, but let rec x = if e then e1 else e2 is not, because the size of the value depends on e
  2. only addresses, not values, of elements of the recursive group can be used during initialization, so e.g. let rec x = ref (C x) is ok, but let rec x = ref (C !x) is not.

The second restriction is the one that's based on a type-system-like check, described in a draft paper. (The manual @eduardoleon linked is a bit out of date, unfortunately.)

The restrictions seem flexible enough to be useful in practice -- for example they allow @eduardoleon's example above, and some other real-world examples involving mutual recursion between functions and records.

eduardoleon commented 5 years ago

My understanding is that, in OCaml, the restrictions are meant to be just strong enough that recursive definitions won't compromise type safety in the progress-and-preservation sense. From a performance standpoint, the overhead of let rec is only present if you explicitly request it, and let has no such overhead, which I think is a reasonable state of affairs.

In a call-by-value language, we expect positive types, i.e. sums and eager tuples, to satisfy suitable induction principles. But, no matter what artificial restrictions we impose on the construction of reference cells (the only natural one is “don't dereference it until it has already been constructed”), we are not going to recover any similar induction principle for them. So we might as well allow self-referential cell construction, which causes no harm.

@yallop Thanks for the reference to the paper.