Add support for libraries/macros

rkalis commented 1 year ago

In certain cases, contracts (or collections of contracts) reuse pieces of code. For readability and reusability of complex operations it would be good to allow defining libraries/macros that live outside of a contract and can be used by different contracts.

Under the hood, the compiler would then replace the library function call statement with its bytecode.

One option would to use macros (aka string replacements). E.g.:

macro MULDIV(x, y, z) { x * y / z }

macro DO_STUFF() {
  int a = 5;
}

The benefit of this is that it results in the most control over performance / bytesize for the developer. But it is also very easy for this to result in worse readability, e.g. using DO_STUFF() would cause confusion:

contract Test() {
  function spend() {
    DO_STUFF()
    require(a == 5); // Whoa, where did 'a' come from?
  }
}

So it likely makes more sense to go with a properly typed library system, even at the cost of slightly larger / less efficient contracts (note: size can probably be brought down by optimisations in the future if needed). We also think it's likely that the opcode limit will be increased at some point in the future.

So what does a library look like? For example:

library Math {
  function muldiv(int x, int y, int z) returns (int) {
    return x * y / z;
  }
  function divmul(int x, int y, int z) returns (int) {
    return x / y * z;
  }
}

A library is a collection of functions, that get compiled individually. The function has parameters and can potentially return one value. This can be extended to multiple values in the future (by "upgrading" the tuple type to allow for larger tuples).

From the consuming contract's perspective: When the library is called in a contract, the compiler treats it as a built in function call (e.g. abs()). In other words, it puts the args on top of the stack and replaces abs() with OP_ABS, except OP_ABS will be a (much) larger piece of bytecode, with more than one opcode, e.g. OP_SWAP OP_MUL OP_DIV. The function-to-bytecode mapping is retrieved from the compiled "library artifact" (see below).

From the library's perspective: Every function in a library is compiled independently. Compiling a library function is similar to compiling a contract with a single function, but there are a few notable differences:

We do not remove the final OP_VERIFY, because we only need to do that at the end of a contract execution, not some function call
We need to add a return statement that preserves the top stack value and cleans the rest of the stack (known to the function)
We need to create a library artifact interface / generation process to store function inputs/outputs/bytecode per function

We also need to add import functionality, allowing importing libraries into contracts or into other libraries. For simplicity we can stick to 1 library/contract per file.

import "./Math.cash";

mr-zwets commented 8 months ago

We also need to add import functionality, allowing importing libraries into contracts or into other libraries. For simplicity we can stick to 1 library/contract per file.

import "./Math.cash";

I'm thinking how this would work in practice with syntax checking pugins, would we call the function on the library like this

int result = Math.muldiv(x, y, z);

If we want to use the function name directly, it might be a good idea to do explicit imports to allow for highlighting.

import { muldiv } from "Math.cash";

contract Example() {
    function test(int x, int y, int z) {
        int result = muldiv(x, y, z);
    }
}

However, code completion would work better with the reversed order:

from "Math.cash" import { muldiv };

rkalis commented 8 months ago

@mr-zwets and I just had a call about this, some of the main points:

We won't generate "library artifacts" for now, instead compile everything on the fly when compiling the main contract.
- We may want to extend support for separate library compilation to share artifacts instead of source code on NPM.
We'll allow for separate functions/libraries in a file together with contracts (or in separate files).
At the start of compilation, we'll "compile" libraries/functions by themselves and add these to the global symbol table.
- From there, the compiler will treat these exactly the same as the builtin "global functions" (e.g. abs(x)).
- We'll just need to extend the symbol table with "bytecode" for functions.
If you import from a separate file, the compiler has to match all pragma statements from all files. So contract with pragma version ^0.9.0 and library with pragma version ^0.8.0 won't compile.
For import syntax we probably want to support both import "X.cash"; and import { y } from "X.cash";.

Steps to get there / checklist:

[ ] Updating symbol table to include compiled "bytecode" and update current "global functions" accordingly.
[ ] Allow for standalone functions in a .cash file.
[ ] At the start of compilation, all standalone functions should get compiled and added to the GLOBAL_SYMBOL_TABLE.
- [ ] Make a compiler distinction between functions that are part of a contract or standalone.
- [ ] Add "return" statement for standalone functions to return something.
[ ] Add import functionality.
- [ ] See how we can re-use Node.js' module resolution from inside the CashScript compiler.
- [ ] Initially only allow import "X.cash"; syntax, where import means copy-paste the entire file contents, and then treat the file as a single .cash file for compilation purposes.

Extensions:

[ ] Support import { y } from "X.cash"; syntax.
[ ] Support library X { function y() {} } syntax in addition to standalone functions.
[ ] Allow for independent library compilations / artifacts.
[ ] Update "tuple" type to allow for multiple return statements from a standalone function.

rkalis commented 7 months ago

We need to consider how we resolve the dependency graph for contracts / libraries that have (semi-)complex dependency graphs. We'll need to think about this.

CashScript / cashscript

Add support for libraries/macros #153