dusklang / dusk

The Dusk Programming Language
Other
1 stars 1 forks source link

Moving values (properly) from comptime to runtime #116

Open zachwolfe opened 2 years ago

zachwolfe commented 2 years ago

Edit: I’m sleep-deprived right now but the original main ideas of this issue seem misguided? The problem we actually need to solve (unless I’m forgetting something) is transferring values from compile-time to run-time. Local values are mostly easy, with the caveats that you need to descend into structures to find and resolve pointers, and that the target type layout may differ from the host. E.g, passing a usize from compile-time on a 64-bit host to a 32-bit target such as WASM, or type definitions that are just plain different on the two platforms (see discussion of a hypothetical FromComptime trait below, which could possibly be applied here as well).

The slightly harder part is resolving pointers. A compile-time pointer can really only point to one of two places and still be resolved correctly: static memory (e.g., static variables or procedurally-allocated static memory) or constant memory (e.g., constant global variables or procedurally-allocated constant memory). Additionally, the memory layout of pointed-to values must be identical in the host and target. This probably isn’t a big deal because differences can be papered over in most cases by converting at compile-time to the proper layout. Types that contain pointers to other kinds of memory (especially heap allocations) might still be useful to be able to tranfer to runtime, but doing so would require implementing a trait on that type, named something like FromComptime. For example, a growable array type could implement the trait by storing its contained values in constant memory, then copying that constant memory to a heap-allocated buffer at run-time. Or, if the array is never written to at runtime, it could even skip the heap allocation and copy.

Old comment text follows: Currently it is pretty much assumed that ~all interpreter Values can be converted to and from their raw bytes. The ability to do this is important for FFI. However, this has the disadvantage that we often have no idea where values/pointers come from, because they may start out as special-case Values and then this special case is stripped away from them. Knowing where pointers come from is important for upcoming compile-time language features, such as custom literals, where you might want to transfer a pointer (or a data structure that contains pointers) from compile-time to run-time. This is clearly ill-defined if all you have is an address.

Idea: values have the option of living a double life. By default, all values will be stored as Value enums with no raw pointers or unsafe code to speak of. Structures just store their member values, and pointers consist of a base address, an offset and perhaps an allocation size. Well-defined pointer arithmetic should continue to work. Bad pointer arithmetic should fail (for example, accessing memory past the end of an allocation). As soon as someone passes a pointer across an FFI boundary, all the data in the relevant values is duplicated into the correct memory layout. Modifying the values from Dusk will update the raw copy. Reading the values from Dusk will check the raw copy first to see if it has changed, then updating the Value copy if necessary and returning the result. To support pointer arithmetic, pointer-sized integers should retain provenance. Performing arithmetic on them should result in the provenance being updated and canonicalized whenever possible. By "canonicalized" I mean for example, taking an integer offset (the most common type of pointer arithmetic, clearly) and converting it into a field offset. When an arithmetic operation cannot be encoded in the provenance (e.g., multiplies, divisions, shifts, etc.), it should trigger an allocation of raw memory for the base allocation, and the provenance should be converted to a PointerSizedValue::Opaque. When transferring pointers from compile-time to run-time, opaque pointers should be looked up in all the compile-time allocations that will be live at runtime (e.g., global constants and static variables). Moving out of bounds of a high-level allocation should be detected and result in an error. Lastly, casting high-level pointers to different types should work whenever possible. For example, suppose you allocate 1024 bytes on the heap, and then store a structure that contains 4 packed u16s at offset 64. If you then cast allocation as usize + 66 to *u16, the interpreter should be sophisticated enough to detect that you want the second field of that structure at offset 64. Other cast types can be considered later, such as reading two adjacent 32-bit integers to a 64-bit integer, or reading the second byte of an integer as a u8. If it is well-defined, it should work.

I also want to point out that while working on this feature, the ability to bail out to raw memory is a useful escape hatch: if some edge case isn't supported yet, just serialize the high-level allocation to its raw representation and perform the operation that way.

Strawman data modeling:

enum OffsetStep {
    Field { strukt: StructId, index: usize },
    IntegerAmount(isize),
    // other things can be added later
}
type Offset = Vec<OffsetStep>;
enum AllocationId {
    Stack(StackAllocationId),
    Heap(HeapAllocationId),
    Static(StaticId), // global variable
    Const(ConstAllocationId), // global constant memory allocated in the executable (e.g., string literals, or anything else you want)
}
enum PointerValue {
    HighLevel { allocation: AllocationId, offset: Offset }, // high-level pointer with very precise provenance known by the interpreter. Can be transferred to runtime iff its lifetime is valid at runtime (only static variables and constant allocations probably)
    Function(FuncId), // pointer to a Dusk-implemented function
    Const(usize), // pointer cast from constant integer, e.g. null or a specific address for MMIO. Can be transferred to runtime
    Opaque(usize), // pointer from FFI, or cast from a non-constant integer. Can not be transferred to runtime, unless they can be proven to have come from a Dusk allocation.
}
enum IntValue {
    Normal(...),
    PointerSized { value: PointerValue, is_signed: bool },
}
enum Value {
    Int(IntValue),
    Float(…),
    Pointer(PointerValue),
    Struct { fields: Vec<Value>, },
    Internal(InternalValue), // e.g., types, StringLiterals
}
struct ValueWithOffset {
    value: Value,
    offset: usize,
}
struct Allocation {
    // Non-overlapping values
    values: Vec<ValueWithOffset>,
    size: usize,
    raw_memory: Option<Box<[u8]>>,
}

struct StackFrame {
    …
    // Not shown: generations to protect against use-after-free. Probably need something different from an IndexVec.
    stack_allocations: IndexVec<StackAllocationId, Allocation>,
}
struct Interpreter {
    ...
    // Not shown: generations to protect against use-after-free, also, the ability to free. Probably need something different from an IndexVec.
    heap_allocations: IndexVec<HeapAllocationId, Allocation>,
    statics: IndexVec<StaticId, Allocation>,
    const_allocations: IndexVec<ConstAllocationId, Allocation>,
}
zachwolfe commented 2 years ago

note that solving this issue properly should also solve #114