HigherOrderCO / HVM

A massively parallel, optimal functional runtime in Rust
https://higherorderco.com
Apache License 2.0
10.45k stars 396 forks source link

Eraser nodes being placed where they shouldn't #362

Open edusporto opened 4 months ago

edusporto commented 4 months ago

Reproducing the behavior

As Nicolas pointed out a few days ago, some programs that should be equivalent return different results, with a few eraser nodes being placed in places they shouldn't. Nicolas suspects it's an issue with duplication, but I still haven't found it.

See the following program in Bend:

map f xs = match xs {
  List/Cons: (List/Cons (f xs.head) (map f xs.tail))
  List/Nil: List/Nil
}

list = [1,2,3]
id = @x x

main = (map @f (map f list) [id])

It gets compiled to this, which works correctly:

@List/Cons = (a (b ((@List/Cons/tag (a (b c))) c)))

@List/Cons/tag = 1

@List/Nil = ((@List/Nil/tag a) a)

@List/Nil/tag = 0

@id = (a a)

@list = c
  & @List/Cons ~ (1 (b c))
  & @List/Cons ~ (2 (a b))
  & @List/Cons ~ (3 (@List/Nil a))

@main = b
  & @map ~ (@main__C0 (a b))
  & @List/Cons ~ (@id (@List/Nil a))

@main__C0 = (a b)
  & @map ~ (a (@list b))

@map = (a ((@map__C1 (a b)) b))

@map__C0 = (* (a (d ({(a b) c} f))))
  & @List/Cons ~ (b (e f))
  & @map ~ (c (d e))

@map__C1 = (?(((* @List/Nil) @map__C0) a) a)

Notice how main contains a call to main__C0. If we replace it with its value, we get

@List/Cons = (a (b ((@List/Cons/tag (a (b c))) c)))

@List/Cons/tag = 1

@List/Nil = ((@List/Nil/tag a) a)

@List/Nil/tag = 0

@id = (a a)

@list = c
  & @List/Cons ~ (1 (b c))
  & @List/Cons ~ (2 (a b))
  & @List/Cons ~ (3 (@List/Nil a))

@main = e
  & @map ~ ((a b) (d e))
  & @map ~ (a (@list b))
  & @List/Cons ~ (@id (@List/Nil d))

// @main__C0 = (a b)
//   & @map ~ (a (@list b))

@map = (a ((@map__C1 (a b)) b))

@map__C0 = (* (a (d ({(a b) c} f))))
  & @List/Cons ~ (b (e f))
  & @map ~ (c (d e))

@map__C1 = (?(((* @List/Nil) @map__C0) a) a)

The original program returns

Result: ((@List/Cons/tag (((@List/Cons/tag (1 (((@List/Cons/tag (2 (((@List/Cons/tag (3 (@List/Nil v89))) v89) v8e))) v8e) v93))) v93) (@List/Nil va4))) va4)

while the other version returns

Result: ((@List/Cons/tag (((@List/Cons/tag (1 (((@List/Cons/tag (* (((@List/Cons/tag (* (@List/Nil vc4))) vc4) vb3))) vb3) va2))) va2) (@List/Nil vec))) vec)

This happens in all implementations of HVM.

System Settings

Additional context

No response

CatsAreFluffy commented 4 months ago

If I disable the REF-DUP optimization, I get a very different result (((@List/Cons/tag (((1 (1 (((1 (* (((1 (* (((0 vce) vce) vc4))) vc4) vb3))) vb3) va2))) va2) (@List/Nil vfc))) vfc)), so presumably the safety check is missing some cases.

edusporto commented 4 months ago

I've tried a few other examples of the same pattern that don't result in the same bug and, as expected, they always have a difference of 1 interaction between the original generated HVM and the equivalent change. When the bug does happen, the variation in number of interactions is different (in the original example, the "good" version does 109 interactions and the "bad" does 138).

When disabling the REF-DUP optimization, like @CatsAreFluffy said, although the final result is wildly different, we get the expected behaviour of the number of interactions having only a difference of 1. I think that's a clue that the bug does not happen when we remove that optimization, although that could just be because the wrong interactions are being performed.

VictorTaelin commented 4 months ago

Could this be related to the SWIT-COMM being problematic, as @tjjfvi pointed out? We need to change the SWIT rules to use SWI nodes, isntead of CON nodes.

enricozb commented 4 months ago

I think this is a combination of the following, safe is always set to true: https://github.com/HigherOrderCO/HVM/blob/21d630f488f1627a075ab72e8d6e422dd069d5f8/src/ast.rs#L507-L516 And that the REF-DUP optimization has no warning on C or Cuda, only on the rust implementation.

edusporto commented 4 months ago

Also, as checked with Nicolas earlier today, the example in the issue (and also the original example that led to this) should not be valid, since they clone non-affine global references. It's probably just a coincidence that they return the "correct" result, and not the expected error message ERROR: attempt to clone a non-affine global reference.

When setting all definitions as safe: false in the code Enrico mentioned above, we get the expected error message in the Rust implementation, although not on the C implementation, since it doesn't perform the same checking in the REF-DUP optimization.

edusporto commented 4 months ago

After digging some more into this issue, we think that both versions of the HVM program are invalid and should somehow error out, since both clone non-affine lambdas.

PR #379 improves safety checking so the first version errors as expected, but it's not enough for the second version. It marks every definition that contains a DUP as unsafe, and propagates that unsafety to all definitions that depend on those. Since main__C0 in the first example is then marked as unsafe, when it's REF in main interacts with a DUP we error out as expected. https://github.com/HigherOrderCO/HVM/blob/1c6ca2e90d5984d7afa28386cd2ac86755d6bda9/src/hvm.rs#L657-L669

Since the second version expands that reference and clones it directly, this safety check is not reached.

For now, we can say that the eraser nodes being placed in such occasions are not necessarily a bug, since they're the result of undefined behavior. Nicolas told me adding an affine type checker to the language would be able to catch such errors at compile time, so it's something we could do in the medium-long term.

enricozb commented 4 months ago

I think while this can be UB for bend it shouldn't be for HVM, though some flag perhaps should be required to run such programs.