ethereum / fe

Emerging smart contract language for the Ethereum blockchain.
https://fe-lang.org
Other
1.61k stars 187 forks source link

Meta: Language design #829

Open brockelmore opened 1 year ago

brockelmore commented 1 year ago

What is wrong?

Hi! Just checking out Fe for the first time really. As an avid rust user and solidity user, I like the direction Fe is headed!

However, I have a few gripes about language design (note, these are nitpicky! you guys have done a great job thus far!).

I could create separate issues for each of these but figured I would brain dump in one, and if any are deemed as worthwhile to pursue they can be separated out.

tl;dr: Trait all the things (and lean into rusty-ness)!

Storage

In my opinion, the ability to store something should be defined by if it implements a Store trait. Loading should be defined by a Load trait. For example:

struct Reserves {
    pub reserve0: u128
    pub reserve1: u128
}

impl std::storage::Store for Reserves {
    fn store(self, mut ctx: Context, slot: bytes32) {
        // more on slot derivation later
        let packed = self.reserve0 << 128 | self.reserve1;
        ctx.store(slot, packed);
    }
}

impl std::storage::Load for Reserves {
    fn load(ctx: Context, slot: bytes32) -> Self {
        // more on slot derivation later
        let val = ctx.load(slot);
        return Sync {
            reserve0: val >> 128,
            reserve1: val ^ (((1 << 128) - 1) << 128),
        }
    }
}

Which brings me on to the semantics of storage. Currently, they are just variables placed anywhere inside a contract keyword. In my mind something like this may be a better path forward:

contract MyContract {
    /// Contract storage
    storage {
        a: A
        b: u256
        // You can have auto-generated view functions based on pub keyword in storage
        pub c: u128
    }
}

Any element that is in storage must implement the Store and Load traits. The elements in here are automatically given storage slots from the compiler. (But any element that implements Store can still be stored manually by calling self.store(ctx, slot) (or Store::store(self, ctx, slot) if you cant resolve which version to use via the type system. Also lets library creators do cool things (This is something I needed for this: https://github.com/brockelmore/memmove)

Events

As recent work has improved the event system, an ideal world would have an Emittable trait as has been worked on. But the current annotations of #indexed feel weird. It is an unnecessary departure in my mind from rust macro syntax of #[indexed], which causes unnecessary overhead in the programmers mind, if coming from rust. Ideally we would reach a world where the following is possible:

// additionally Emittable maybe not the best word for this? i like Log
#[derive(Emittable)]
struct AuctionEnded {
    #[indexed]
    pub winner: address
    pub amount: u256
}

// alternatively manually implement it:
struct A {
    a: u256
    b: u128
}

/// Implements how to log struct A, could be derived via "Log"
impl std::log::Log for A {
    fn log(self) {
        std::log::log4(a.a, a.b, a.c, a.d);
        // could have alternatively done:
        // std::log::log3(a.a, a.b, a.c, a.d);
        // which would do log3 op instead. The derive macro would
        // match and order appropriately
    }
}

(even if there is a default implementation for all types and structs of Emittable, (btw trait naming should probably just be Log or Emit imo) it should be implementable manually)

Mapping aside

If you have a Hash or Mapping trait, you can allow the user to define how to hash the struct used in a Map, where the K: Hash implementation is left to the user (or derived).

Like rust, but not

That brings me to a broader point: there are a ton of really good semantics in this language - but my largest complaint is that there are a good number of things that are rust-like, but differ in an not meaningful way which just boils down to overhead for anyone coming from rust (lack of ;, inability to use , in structs (the combination of each leads to significant whitespace which sucks)).

Function dispatch

A common request from solidity land is to be able to define the dispatch table (i.e. I know transfer will be called more than mint). One way that this could be achieved is as follows:

contract SimpleOpenAuction {
    pub fn main(self, mut ctx: Context) {
        match ctx.signature() {
            // main can be unconstrained in return type, or be `-> bytes` maybe?
            Self::A => { return self.A() },
            Self::B => { self.B(ctx) },
        }
    }

    pub fn A(self) -> u256 {
        // do something
        return 1
    }

    pub fn B(self, ctx: Context) {
        // do something else
    }
}

The above has a few ideas mixed in there, (like treating functions as enum variants), a main function that is entered on every call, ability to set up the match as you want, etc.

Abi as a kind of alias for Trait

This is unabashedly borrowed from the sway lang (albeit, I dislike how they do it and suggested a similar thing to them to no avail). The idea is to make ABIs a first class citizen and to promote usage and implementation of them (could foresee a common set being in the std lib).

// abi keyword, similar to a trait, but for contracts
abi ERC2612 {
    pub fn permit(owner: address, spender: address, value: u256, deadline: u256, v: u8, r: bytes32, s: bytes32);
    pub fn nonces(owner: address) -> u256;
    pub fn DOMAIN_SEPARATOR() -> bytes32;
}

contract MyContract {
    // A contract can impl an abi trait
    // not sure if this should be *inside* the contract span or be implementable
    // outside like `impl ERC2612 for MyContract {..}`
    impl ERC2612 for Self {
        pub fn permit(owner: address, spender: address, value: u256, deadline: u256, v: u8, r: bytes32, s: bytes32) {
            // snip
        }
        pub fn nonces(owner: address) -> u256 {
            0
        }
        pub fn DOMAIN_SEPARATOR() -> bytes32 {
            bytes32::new(0)
        }
    }
}

Verbatim/Assembly block

Allow for usage of raw assembly blocks, a la verbatim_i1_o1 style yul blocks. In an ideal world this would look like:

// asm is a keyword similar to rust's async but instead generates an assembly block
asm fn my_assembly_fn(a: uint256) -> u256 {
    PUSH 0x11
    ADD
}

In a perfect world, some analysis would be ran on the assembly block to ensure it keeps stack consistent with promised input/output (see my comment here: https://github.com/ethereum/solidity/issues/12067#issuecomment-1270388088 for more promises the user can make to the compiler about this code and when the compiler should emit an error/warning)

Anyways, sorry for the braindump here! I've enjoyed playing with Fe and hope to see continued work and improvement on it! I have started looking through the codebase and if there is alignment on any of the above I can try to hop in an help (I am a core contributor to foundry, helping with reth, and have chatted with the solidity team a bunch about optimizations and such).

Thanks for y'alls work!

sbillig commented 1 year ago

Thanks, @brockelmore for the kind words and the detailed comments! I'll try to respond to each point, but in summary, we mostly agree with you, and some of these things are on our collective mental backlog, but aren't yet logged as issues. We tend to discuss language direction either in person when we're able, or on our weekly video calls (open to everyone, btw, details on the discord meeting-topics channel), but of course much of this inevitably doesn't get documented.

Like rust, but not

There are some small syntax warts that are holdovers from the old days of python-like syntax (this project started as a rust implementation of vyper), eg we should allow commas in struct defs, struct initializers should use braces, etc. We just haven't gotten around to cleaning these things up yet. Semicolons, I've argued that we shouldn't add, as I think rust's use of semicolons isn't ideal, but we may add them if we can't avoid ambiguity etc in a satisfactory way without them.

But, I think these small things feel weird because the functionality we've implemented so far feels like we're just building a rust-lite ("Shitty Rust", I call it) targeting the evm, but our plan is to diverge from rust in some significant ways. (I love rust as much as the next person, but some of its semantics and complexity make it an imperfect fit for this domain, IMO, and if one wants to use rust, it exists already).

For example, the plan for contract syntax and semantics is to move to a message-passing style, which is a better fit of the actual underlying platform. I documented some of the ways in which the current struct-like contract formulation is misleading, and a rough sketch of what the message passing style might look like here: https://notes.ethereum.org/QVY5Uiz1SeCzj8KOuA9gtw?view (this document is a rough exploration of ideas, subject to change, should be polished, yada). If you have thoughts on this direction broadly or any of specifics, I'd love to hear them. It overlaps with some the stuff you mentioned above in the custom dispatch.

Another tentative plan we have is to not follow rust's lead on explicit &/&mut types and lifetimes, but instead take some inspiration from https://www.val-lang.dev for "projections" that provide borrows in a way that's simpler and more limited than rust's. We don't have a good document explaining the plan here yet, but one's coming "soon"

the current annotations of #indexed feel weird. It is an unnecessary departure in my mind from rust macro syntax of #[indexed]

No strong feelings here; the current plan is to allow #foo and #foo(bar: val), but not allow multiple attributes on a single line (which would require brackets), and not allow name = "val" attributes, which would also require square brackets. :shrug:

Storage

I like this general idea; I sorta touched on something similar in the message passing doc linked above, where contract storage is a particular struct type that implements a Storage trait, but didn't work out specifics.

A currently open problem is to decide what sort of low-level primitives the standard library needs to provide to allow safe but powerful custom use of storage and memory. ctx.store(slot, val) seems dangerous for normal code, if slot here isn't restricted a safe range. We've discussed something like MSlice and SSlice types that would give unfettered access to a reserved range of memory or storage, for uses like your memmove library. (Thanks for that link, btw, it'll be a good reference). The goal would be for such library code to not need to drop into an unsafe block except in very special cases.

If you have a Hash or Mapping trait, you can allow the user to define how to hash the struct used in a Map, where the K: Hash implementation is left to the user (or derived).

:+1: This should be a gh issue. We also plan to add an Add trait, etc for ops, and other similar traits from Rust.

even if there is a default implementation for all types and structs of Emittable, (btw trait naming should probably just be Log or Emit imo) it should be implementable manually

Agreed on both points. I prefer Log/log() over Emittable/emit(). Much of this trait stuff is new, and the magical behavior of the Emittable trait is a temporary stop-gap solution.

Abi as a kind of alias for Trait We'll definitely do something like this, but see the message passing doc above.

Function dispatch

https://github.com/ethereum/fe/issues/559 This feature is implemented but not documented yet, and lacks some supporting features that make it usable for normal contracts (like a way to easily match on calldata signature). See the message passing doc though.

Verbatim/Assembly block

The current hope is that we can provide sufficient power via unsafe std lib functions, but TBD. We plan to target more than just evm, via https://github.com/fe-lang/sonatina

[I'm tired and out of steam for now; happy to discuss any of this further. I appreciate the time you took to braindump!]

sbillig commented 1 year ago

Oh, forgot to comment on the most important part:

I have started looking through the codebase and if there is alignment on any of the above I can try to hop in an help (I am a core contributor to foundry, helping with reth, and have chatted with the solidity team a bunch about optimizations and such).

Yes please!

brockelmore commented 1 year ago

Wow thank you for the detailed response!

For contracts, a contract type is a “newtype” wrapper around the primitive address type. However, the self contract is something different. The self address is retrieved via the evm ADDRESS opcode. Calling a function on the self contract is an internal call, rather than an external call. It’s (currently) not possible to pass the self contract into a function that expects an instance of that contract type.

use std::context::Context
contract C {
 pub fn foo(self) {}
 pub fn bar(self, ctx: Context) {
   let other: C = C(address(0))
   call_cfoo(c: other) # OK
   call_cfoo(c: self)  # ERROR, invalid use of contract `self`
   call_cfoo(c: C(ctx.self_address())) # OK; this makes an external call to the current contract
 }
}

fn call_cfoo(c: C) {
 c.foo()
}

I am hesitant to advise the actor model (I spend a good amount of time with Actix actor model in rust). This is mostly due to the dynamism associated with interacting with contracts you don't have control over. So some thoughts on all of this: the core of the issue is whether or not to execute the function (without a call) or to call self address and enter the foo function in my mind (ignoring any limitations in the typesystem currently, this is the common thread amongst the motivation behind wanting to do the actor model). The above code is correct in this paradigm imo, and just needs a bit of ergonomics added. For this:

contract C {
  pub fn foo() { .. }
  fn g(self) {
    let c: C = ..
    c.foo()    # OK
    self.foo() # ERROR, `foo` does not take `self` use `foo()`
  }
}

should likely be:

contract C {
  pub fn foo() { .. }
  pub fn bar(mut self) { .. }
  fn g(self) {
    let c = C(address(0));
    self.foo() // this would error suggesting Self::foo()
    Self::foo() // this would execute the function without a call
    Self::foo(self) // this would error, and suggest C::foo(ctx.self_address()) if the user wants to make it a call or Self::foo() for internal
    self.bar() // this would execute bar without a call as it takes a self
    C::foo() // this would error as there is not a specified instance of C
    C::foo(address(0)) // this would work because you provide it with an address
    c.foo() // this would work because c is an instance of `C`
  }
}

This clarifies the internal vs external call by if you use Self its internal, if you use ContractName its external. Effectively this creates a Self type and a Contract type.

A currently open problem is to decide what sort of low-level primitives the standard library needs to provide to allow safe but powerful custom use of storage and memory. ctx.store(slot, val) seems dangerous for normal code, if slot here isn't restricted a safe range. We've discussed something like MSlice and SSlice types that would give unfettered access to a reserved range of memory or storage, for uses like your memmove library. (Thanks for that link, btw, it'll be a good reference). The goal would be for such library code to not need to drop into an unsafe block except in very special cases.

There seems to be a concept of unsafe already in the codebase, so you could slap ctx.store behind unsafe, make std::storage::Store trait unsafe, and by default #[derive(Store)] (note: you effectively already derive this today!) will produce safe code. My general takeaway is you guys have the opportunity to be like rust and have the unsafe escape hatch. Many people will likely not use unsafe, but for us optimizoors it would be a godsend.

https://github.com/fe-lang/sonatina

Wow! So glad you guys are going down the path of not just Yul targeting, that makes me even more excited. Maybe one option would be do allow something like this:

fn A(a: uint256) -> uint256 {
    unsafe {
        let asm_block = ctx.assembly(|ctx, elem| {
            ctx.push(1)
            ctx.add()
        });
        return asm_block(ctx, a)
    }
}

I understand if you are hesitant to do so, but I think there are a lot of the optimizors would love this feature.

Do you guys have a chat some place? I may start hacking on some stuff and would love to bounce ideas around

sbillig commented 1 year ago

chat

https://discord.gg/ywpkAXFjZH

sbillig commented 1 year ago

There seems to be a concept of unsafe already in the codebase, so you could slap ctx.store behind unsafe, make std::storage::Store trait unsafe, and by default #[derive(Store)] (note: you effectively already derive this today!) will produce safe code. My general takeaway is you guys have the opportunity to be like rust and have the unsafe escape hatch. Many people will likely not use unsafe, but for us optimizoors it would be a godsend.

We could; we have unsafe functions in the std library (https://github.com/ethereum/fe/blob/master/crates/library/std/src/evm.fe) to perform these low-level operations, so we could probably do this approximately as you've written today, but the point I was trying to make is that it should be possible to make the Store trait implementable (in most cases at least) with only safe code, with the proper abstraction. Eg, something like:

trait Store {
  const SLOT_COUNT: usize
  fn store(self, mut sto: StorageSlice<Self::SLOT_COUNT>)
}

impl Store for Foo {
  const SLOT_COUNT: usize = 2
  fn store(self, mut sto: StorageSlice<Self::SLOT_COUNT>) {
    sto[0] = self.a << 128 | self.b
    sto[1] = self.c << 128 | self.d
  }
}

Similarly for memory operations, your dynamic array code from memmove could probably be ported to fe as-is today, but it could instead be something like:

use std::collections::MemBuffer

struct DynArray<T> {
  len: usize
  buffer: MemBuffer<T>,
  pub fn new(capacity: usize) -> Self {
    Self {
      len: 0
      buffer: MemBuffer::new(usize)
    }
  }

  pub fn push(mut self, elem: T) {
    if self.len == buffer.len() {
      // optimization: try to expand the buffer if the memory space after it is free
      if buffer.try_expand(1).is_err() {
        // make a new buffer, copy the data over
        // optimization: memcopy via a call to the identity precompile
        let mut new_buffer = MemBuffer::new(self.buffer.len() + 1)
        unsafe {
          evm::staticcall(evm::gas(), 0x04, self.buffer.mem_pointer(), sz, new_buffer.mem_pointer(), sz)
        self.buffer = new_buffer
      }
    }
    self.buffer[self.len] = elem
    self.len += 1
  }

IMO, we should strive to eliminate all need for raw s/mloads and s/mstores, and the need to make assumptions about the memory allocation scheme (btw, fe currently copies solidity here, but that'll change sometime after the move to sonatina). Maybe we always need the escape hatch, but the less it's used the better.

sbillig commented 1 year ago

C::foo() // this would error as there is not a specified instance of C C::foo(address(0)) // this would work because you provide it with an address c.foo() // this would work because c is an instance of C

The point of that example is that the syntax exactly matches that of structs, but the rules are different. If C were a struct type instead, it would be the opposite set of lines that gave you errors.

I am hesitant to advise the actor model (I spend a good amount of time with Actix actor model in rust). This is mostly due to the dynamism associated with interacting with contracts you don't have control over. So some thoughts on all of this: the core of the issue is whether or not to execute the function (without a call) or to call self address and enter the foo function in my mind (ignoring any limitations in the typesystem currently, this is the common thread amongst the motivation behind wanting to do the actor model). The above code is correct in this paradigm imo, and just needs a bit of ergonomics added.

The point I'm trying to make with that list of motivations is that "structlike" (has fields and public/private functions) is a bad abstraction for contracts. It's an obvious one, and it seems to sorta fit. But contract calls are fundamentally different from local function calls, and now we're trying to stretch the abstraction to allow for things like a custom dispatch function, different CALL codes, calling a contract function with some eth value and gas, and calling/defining contracts that don't use the solidity abi or support multiple abis.

So what I'm suggesting is that instead of torturing the struct/function-call abstraction by adding weird rules and syntax for contracty stuff, we instead abandon it and use an new abstraction: that of sending a message and (maybe) receiving a response. Of course, this isn't much of an abstraction at all, because this is basically what an evm call is, and thus it's an abstraction that fits quite well.

I don't think the result will necessarily feel like an using actor library or language; it should feel a lot like Fe today, but with natural syntactic solutions to the problems I listed above.

// basic call
ctx.call(some_contract, MyMsg::Transfer { from: me, to: you, value: 10 })
// other call ops are easy
ctx.static_call(some_contract, MyMsg::Foo)
// sending eth, setting gas is easy (don't know what this fn should be called)
ctx.call_with_value(some_contract, gas: 10000, value: Wei(100), MyMsg::X)

// ctx.call doesn't assume a particular abi; it can accept anything that
// implements the Encode trait
ctx.call(some_contract, u8(10))

// to define a contract that uses a different or multiple abis,
// manually impl the Encode trait or us a magical macro
#derive(Encode(abi: "cool_abi"))
enum MyCoolMsg { .. }

impl Encode for SomeOtherMsg {
  // TBD what this would look like
  fn encode(self, mut out: Tbd) {
    match self {
      SomeOtherMsg::TransferSol { from: address, to: address, val: u256 } => {
        out.write(std::abi::sol::signature("transfer(address,address,val)"))
        out.write(std::abi::sol::encode(from))
        ...
      }
      SomeOtherMsg::TransferSlim { from: address, to: address, val: u32 } => {
        out.write(u8(0)) // ultra efficient dispatch, or something
        ...
      }
    }
  }
}

// "custom dispatch" isn't a weird special case; every contract just directly
// defines its dispatch

contract GuestBook {
  storage {
    visitor_count: u256
    book: Map<address, String<100>>
  }

  recv(msg: GuestMsg) {
    match msg {
      // most common message at the top
      GuestMsg::Read { addr } with storage {
        return storage.book[addr]
      }
      GuestMsg::Sign { addr, note } with mut storage {
        storage.visitor_count += 1
        storage.book[addr] = node
      }
    }
  }
}

Of course, there are issues to work out with this idea too, and the syntax above is just a first draft. Maybe it'll turn out to be a bad direction, but we haven't spent any time trying to refine it yet. I'm always open to having my mind changed, btw, so alternatives ideas are most welcome.

brockelmore commented 1 year ago

I like the call semantics decently well (type safety seems annoying here though), but not so much the recv semantics. What if we invert the paradigm here, and do something like:

 MyMsg::Transfer { from: me, to: you, value: 10 }.send_to(impl Recv<MyMsg>)

this feels better than using ctx.call semantics imo. You could have a trait SendTo<T: Recv<Self>>, and an abi would automatically implement Recv<_>

Then you could eventually do:

 if let Ok(resp: MyMsg::TransferResponse) = MyMsg::Transfer { from: me, to: you, value: 10 }.send_to(impl Recv<MyMsg>) {
     // handle ok response
 } else {
     // handle failed, or use `?` to revert with a bubble revert
}

A full example:

#[derive(Msg)]
contract A {
    pub fn a(mut self, a: u256, b: u256) -> u256 {
        return a + b
    }
}

// if we dont have the contract:
#[derive(Msg)]
abi A {
    fn a(...)
}

contract B {
    pub fn b(mut self, a: u256, b: u256) -> u256 {
        // A is `impl Recv<A::Msg>`
        let a_contract: A = address(0)
        // A::Msg is `impl SendTo<T: Recv<Self>>`
        let resp: u256 = A::Msg::a { a: a, b: b }.send_to(a_contract)?
        return resp
    }
}

// in std lib
pub trait Recv<T> {}

impl Recv<Msg> for A {}

// in std lib
pub trait Encodable {
    fn encode(self) -> Vec<u8>;
}

impl Encodable for A::Msg { // snip }

// in std lib
pub trait SendTo: Encodable {
    type Output;
    fn send_to<T: Recv<Self>>(&self, ctx: Context, contract: T) -> Result<Self::Output, ()>;
}

impl SendTo for A::Msg {
    type Output = A::MsgOutput;
    fn send_to<T: Recv<Self>>(&self, ctx: Context, contract: T) -> Result<Self::Output, ()> {
        ctx.call(self, contract)
    }
}

impl Context {
    pub fn call<V: Encodable, T: Recv<V>, Ret>(&self, data: V, contract: T) -> Result<Ret, ()>{
        let bytes = data.encode();
        // __call(contract, bytes);
    }
    // snip
}