Open edusporto opened 4 months ago
If I disable the REF-DUP optimization, I get a very different result (((@List/Cons/tag (((1 (1 (((1 (* (((1 (* (((0 vce) vce) vc4))) vc4) vb3))) vb3) va2))) va2) (@List/Nil vfc))) vfc)
), so presumably the safety check is missing some cases.
I've tried a few other examples of the same pattern that don't result in the same bug and, as expected, they always have a difference of 1 interaction between the original generated HVM and the equivalent change. When the bug does happen, the variation in number of interactions is different (in the original example, the "good" version does 109 interactions and the "bad" does 138).
When disabling the REF-DUP optimization, like @CatsAreFluffy said, although the final result is wildly different, we get the expected behaviour of the number of interactions having only a difference of 1. I think that's a clue that the bug does not happen when we remove that optimization, although that could just be because the wrong interactions are being performed.
Could this be related to the SWIT-COMM being problematic, as @tjjfvi pointed out? We need to change the SWIT rules to use SWI nodes, isntead of CON nodes.
I think this is a combination of the following, safe
is always set to true
: https://github.com/HigherOrderCO/HVM/blob/21d630f488f1627a075ab72e8d6e422dd069d5f8/src/ast.rs#L507-L516 And that the REF-DUP optimization has no warning on C or Cuda, only on the rust implementation.
Also, as checked with Nicolas earlier today, the example in the issue (and also the original example that led to this) should not be valid, since they clone non-affine global references. It's probably just a coincidence that they return the "correct" result, and not the expected error message ERROR: attempt to clone a non-affine global reference
.
When setting all definitions as safe: false
in the code Enrico mentioned above, we get the expected error message in the Rust implementation, although not on the C implementation, since it doesn't perform the same checking in the REF-DUP optimization.
After digging some more into this issue, we think that both versions of the HVM program are invalid and should somehow error out, since both clone non-affine lambdas.
PR #379 improves safety checking so the first version errors as expected, but it's not enough for the second version. It marks every definition that contains a DUP as unsafe, and propagates that unsafety to all definitions that depend on those. Since main__C0
in the first example is then marked as unsafe, when it's REF in main
interacts with a DUP
we error out as expected.
https://github.com/HigherOrderCO/HVM/blob/1c6ca2e90d5984d7afa28386cd2ac86755d6bda9/src/hvm.rs#L657-L669
Since the second version expands that reference and clones it directly, this safety check is not reached.
For now, we can say that the eraser nodes being placed in such occasions are not necessarily a bug, since they're the result of undefined behavior. Nicolas told me adding an affine type checker to the language would be able to catch such errors at compile time, so it's something we could do in the medium-long term.
I think while this can be UB for bend it shouldn't be for HVM, though some flag perhaps should be required to run such programs.
Reproducing the behavior
As Nicolas pointed out a few days ago, some programs that should be equivalent return different results, with a few eraser nodes being placed in places they shouldn't. Nicolas suspects it's an issue with duplication, but I still haven't found it.
See the following program in Bend:
It gets compiled to this, which works correctly:
Notice how
main
contains a call tomain__C0
. If we replace it with its value, we getThe original program returns
while the other version returns
This happens in all implementations of HVM.
System Settings
Additional context
No response