TritonVM / tasm-lang

Writing tasm with Rust syntax
15 stars 2 forks source link

Cannot handle nested `Vec<Vec<T>>` outside of `Box` #58

Closed Sword-Smith closed 5 months ago

Sword-Smith commented 6 months ago

let a: Vec<Vec<u32>> = vec![e0, e1, ...], e0 = vec![e00, e01] is currently represented like this:

Layout A Stack: _ *a Memory Address Value
a a_len
a+1 *e0
a+2 *e1
a+3 *e2
... ...
e0 e_0_len
e0 +1 e_0[0]
e0 +2 e_0[1]
... ...
For Box<Vec<Vec<T>>>, BFieldCodec dictates the memory layout, since this is how values are non-deterministically initialized in memory: Layout B Stack: _ *a Memory Address Value
a a_len
a+1 e_0_size
a+2 e_0_len
a+3 e_0[0]
a+4 e_0[1]
a+5 e_1_size
a+5 e_1_len
a+5 e_1[0]
... ...

This mismatch in memory layouts is incompatible with the compiler's rule that Box<Vec<T>> has the same memory layout as Vec<T>. Currently the code generator for setting and getting values based on indices assumes the memory layout for Box. So how do we progress here? I see a few options:

  1. Give up on the rule that Box<Vec<T>> has the same memory layout as Vec<T> in all cases
  2. Change memory layout for Vec<Vec<T>>
  3. Change BFieldCodec, though I'm not sure how to use pointers
  4. Accept that the compiler can only handle nested lists if they're non-deterministically initialized.

Let me hear your thoughts :)

Sword-Smith commented 5 months ago

After some discussion with @jan-ferdinand and @aszepieniec I am leaning towards 2, re-allocating memory when constructing nested lists. An alternative would be to change BFieldCodec, option 3 above, such that it contains pointers to appended data, making BFieldCodec follow layout A. This would probably mean that all non-deterministically initialized objects would have to be placed in memory at address 0 (to make pointers of encoding consistent with pointers to RAM) but since we have dedicated the memory region $[0,2{32})$ to initialized RAM, that should be fine.

For now though, we go with option 2 which is going to make the runtime of building nested vectors and boxing values through Box::new very expensive. We accept this cost since we do not believe that we need to do any of those things in the recufier.

Sword-Smith commented 5 months ago

For now, we accept that this is a known limitation of the compiler. Fixing it according to 2 above (Change memory layout for Vec<Vec<T>> to match that of Box<Vec<Vec<T>>>) would incur a runtime punishment of reallocating memory for each element that is pushed to the vector, and a linear running time for index arithmetic for each push, as the free pointer needs to be found each time push is called. We don't think we need this, so we're not going to fix it.

With a8d6851b48ad514c53762f84e047aad87095f700 the compiler panics is the developer attempts to build such a vector.