ethdebug / format

Smart contract debugging data format – Standards development working group
https://ethdebug.github.io/format/
46 stars 4 forks source link

More future-proof type definitions (Algebraic data types) #19

Open ekpyron opened 1 year ago

ekpyron commented 1 year ago

[just dumping my comment from matrix] For the high-level type descriptions, I'm still wondering if we should make it more future-proof from the start. We will have general algebraic data types in the future and enums and structs are just special cases of those equipped with additional name information. I'm fine with just representing the current state of the language in the format, but that means we'll very likely need to break and extend this in the not-so-distant future, so I'd recommend generalizing this a bit more from the start.

cameel commented 1 year ago

Related: #26. Fe already has some of these types, which might be another reason to include them.

Also, some general points from me:

gnidan commented 11 months ago

Supporting ADTs in the format makes sense +1

I assume that any schema for describing ADTs would be distinct from the schema for describing how types are realized for a particular data location? I could see this going either way, since it's valid to view "uint array in memory" (e.g.) as doubly wrapped, but the allocation concern seems like a distinct concept from the type semantics.

ekpyron commented 11 months ago

Technically it is possible to consider locations as data type constructor on top of any type.

First of, for example, a struct

struct S { uint256 x; uint256 y; }

can be thought of as syntactic sugar over an algebraic data type - so this can be thought of as a tuple of two uint256s, so (uint256,uint256) that is just additionally equipped with additional names for the type itself and its members.

A concrete occurrence of the struct in memory, i.e. S memory, still has the semantics of that tuple in memory, but on stack is just a pointer. So you can conceptualize ... memory as a wrapper of ... into something that moves the semantics of ... into memory, and on stack is just a pointer. But that pointer on stack again is still an algebraic data type, just a trivial one.

For dynamic arrays like uint[] and calldata, the pointer type uint[] calldata is even a non-trivial algebraic data type on stack, i.e. a tuple of pointer and length.

Maybe the way to do this is to actually make pointing a first class construct? All types on stack can be represented as algebraic data types, including pointers. Some of them then point to another thing at some other location that can be represented as a different algebraic data type.

But yeah, I haven't yet fully thought this through myself either.