ameliatastic / seahorse-lang

Write Anchor-compatible Solana programs in Python
Apache License 2.0
313 stars 43 forks source link

Expression context #77

Closed ameliatastic closed 1 year ago

ameliatastic commented 1 year ago

Putting some thoughts into a branch for how to tackle this problem. Context.

Basically, the compiler needs a way for context to be passed down to expressions. When the build stage is run and a Rust-like AST is generated, the entire tree is built from the leaves up. This means that expressions like a = f(x, y) will first build the syntax tree component for a, x, then y, then combine those to make f(x, y), then combine those to make a = f(x, y).

This introduces a small problem: the innermost expressions have no way of knowing the context that they're being created in. An example that's been resolved in a somewhat hacky way: a very simple assignment statement like

x.a = y.b

needs to be turned into something like

x.borrow_mut().a = y.borrow().b;

due to how mutability is handled. Even though syntactically, the (Python) expressions on the left and right are exactly the same, they need to generated different (Rust) expressions! Right now this is handled by just traversing the LHS a second time and making all of the borrows mutable borrows, but this could easily lead to unforeseen problems. It would be much cleaner to just generate the right code the first time.

However, getting there is a challenge, because this involves actually changing the compiler, and maybe changing it significantly. On this PR I'll be looking trying to figure out a good way of solving the problem. Any outside help would be greatly appreciated. Thanks!

Closes #81 Closes #72

ameliatastic commented 1 year ago

Explaining the initial commit: note that it doesn't compile or do anything, not even close.

But basically my idea is to add a context "stack" to the function that builds expressions (build_expression). The compiler is allowed to push new context elements onto the stack as build_expression calls itself recursively, which allows context to be passed down as we descend deeper down the tree of expressions.

Context can be added to the stack in two ways:

  1. Simple compiler rules can append things (an example is shown, ExprContext::LVal can be pushed when the compiler is building the LHS of an assignment)
  2. Transformations can store a context element which needs to be pushed before building its subexpressions

In order to use the context, the signature of the Transformation function is changed to include the context stack as an argument.

This lets us reuse all of the existing transformation logic, hopefully doing so in a minimally disruptive way. Transformations sort of get repurposed as context consumers and producers, so maybe a name change would be nice.

ameliatastic commented 1 year ago

...in testing I just learned that I never even needed to add this change in order to get strings to work in seed arrays, idk why I didn't realize that before lol (it introduces a small amount of unnecessary overhead though, so I'll keep going).

Also note that check is definitely failing but the code still works.

ameliatastic commented 1 year ago

lol what's going on it compiles differently on my computer

mcintyre94 commented 1 year ago

In case it's helpful:

I have a commit on my fork which I think fixes the build error, by unpacking the ExpressionObj::Move case in seeds: https://github.com/ameliatastic/seahorse-lang/commit/db348d7c5ccc28e1901028cc6d77f6c78e3659b1

That allows the seahorse compile to succeed, so that we can test the output

I was doing some testing with this and it looks like the to_string added to panic lines, eg panic!("This is not your calculator!".to_string()); is a compile error.

error: expected `,`, found `.`
  --> programs/calculator/src/dot/program.rs:73:46
   |
73 |         panic!("This is not your calculator!".to_string());
   |                                              ^ expected `,`

error: argument never used
  --> programs/calculator/src/dot/program.rs:73:47
   |
73 |         panic!("This is not your calculator!".to_string());
   |                ------------------------------ ^^^^^^^^^^^ argument never used
   |                |
   |                formatting specifier missing

There's also an error on the payer (this is introduced by my commit):

error: the payer specified does not exist.
   --> programs/calculator/src/lib.rs:184:13
    |
184 |         pub calculator: Box<Account<'info, dot::program::Calculator>>,
    |             ^^^^^^^^^^

Changing payer = owner . clone () back to payer = owner fixes that

Could we extend the context introduced here to handle these cases maybe?

mcintyre94 commented 1 year ago

Did a bit more poking and this commit gets the examples compiling: https://github.com/ameliatastic/seahorse-lang/commit/e6b075e0aaf3d57f30ab6fcdd56c4dd6984348dd

ameliatastic commented 1 year ago

Cool! The approach that I've been looking at has been slightly different - I think that I might just be adding the Moves and Mutables a bit too liberally to start, so I'm trying to prune them away a bit by not generating them in the first place.

ameliatastic commented 1 year ago

I added an Assert context, and use it on the msg for an assert. Assert renders without to_string (in the same if statement as eg Seed)

Nice. I think that's a good usage of the context system

mcintyre94 commented 1 year ago

Cool! The approach that I've been looking at has been slightly different - I think that I might just be adding the Moves and Mutables a bit too liberally to start, so I'm trying to prune them away a bit by not generating them in the first place.

Nice, definitely think the less unnecessary mutables/clones we have the better, makes the output a bit easier to follow too. I noticed on my fork there's a few places with .clone().clone() too which I don't think will cause any issues but definitely means we have more than we need!

ameliatastic commented 1 year ago

Okay, I didn't expect that to work 100% on the first try of course but still that's way too many clones LMAO

ameliatastic commented 1 year ago

I think that to continue this approach, either the Moves need to be context-aware so that they don't show up in a place where a method is being called (like .borrow(), .key(), etc.), or some other approach. I think that in general they don't need to show up unless they're at the end of an expression (so remove all intermediate clones?).

mcintyre94 commented 1 year ago

Was just having a look at the tests with this latest commit. I'm wondering are there any cases where we need clones within user functions? Looking at the current output of eg calculator, the clones are all either in generated code like load/store/impl blocks, or in our kinda instruction wrappers eg. do_operation_handler(owner.clone(), calculator.clone(), op, num);

I'm not sure if there's many places in python -> rust compilation where we need it? Looking at the tests the only other case is emitting events (which is also special cased in the generate stage rather than relying on anything generic to add clones). Might be something I'm not thinking of that isn't in the tests yet though.

ameliatastic commented 1 year ago

There definitely are, see #72. It's tough to find the right rule that captures this behavior though. Like you said, a .clone().clone() technically doesn't hurt but I think we can figure out a way around adding way too many clones, lol.

Clones are at least needed when putting objects into things (lists, tuples, even single variables to be complete) and passing to functions, since that's when a move usually occurs in Rust. So maybe that could be the rule? Here's what those cases look like in Python:

x = ... # Something mutable

a = x        # Moves x into a
b = (x, ...) # Moves x into the tuple
c = [x, ...] # Moves x into the list
d = f(x)     # Moves x into the function parameters

Probably pretty easy to handle cases like this actually, so I'll try it.

ameliatastic commented 1 year ago

Oh wow, looking at the diff that's WAY better than before. Still need to test the approach with a more rigorous test, though

mcintyre94 commented 1 year ago

Awesome, that looks great! Small commit on my fork to remove the to_string from panic and get the examples compiling: https://github.com/ameliatastic/seahorse-lang/commit/f5ad27b6e3669c80f0a0ee6ec266b1df55f1909d

I'll do a bit more testing but looks good so far :)

BTW just because I'm not sure where else to document this, there's a new script in playground that I've been using to test this

Then if you run the client it'll use the custom seahorse compiler :) I find playground test UI is the easiest way to test the actual instructions if you want to go beyond just testing they compile with the custom build

Also tested this example based on #72

from seahorse.prelude import *

# This is your program's public key and it will update
# automatically when you build the project.
declare_id('FBw24RKSGqxLawd5Q5mWztJGZ3HVdgRDsRUyHEyroKRK')

@instruction
def ix(sig: Signer):
    a = array(1, 2, 3)
    steal(a)
    print(a[0])

def steal(arr: Array[i32, 3]):
    pass

def myfunction(myarray: Array[f64,3]) -> f64:
  return myarray[0] + myarray[1] + myarray[2]

def mytest(myarray: Array[f64,3]) -> f64:
  b=myfunction(myarray)
  for j in [0,1,2]:
    a=myarray[j]
    print(f'A number is {a}')
  return b

@instruction
def test(signer: Signer, number: f64):
    print(f'Your number is {number}.')
    sum = mytest(array(number, number + 1.0, number + 2.0))
    print(f'the sum is {sum}')
ameliatastic commented 1 year ago

Nice! I'll just incorporate the change here. Need to review the other original changes you made, are there any other changes you had made that aren't reflected here?

mcintyre94 commented 1 year ago

Nope I think your last commit fixes what my other changes were hacking around, they were to do with ExpressionObj::Move objects introduced by earlier commits

ameliatastic commented 1 year ago

Attempting to clean up the code a bit by newtyping Vec<ExprContext>. Gives us the ability to define methods on top of it which might be nice, but I'm not sure if this is better.

ameliatastic commented 1 year ago

Also Closes #81 Closes #72

ameliatastic commented 1 year ago

(did that do anything)

mcintyre94 commented 1 year ago

I think ExprContextStack makes sense here because we probably don't want to just treat it like a Vec. Specialising it a bit seems nice/makes things a bit clearer IMO. Not a strong view either way but I think it's worth keeping

And I think you need to add those closes to the original description to get them to auto-close, don't think PR comments work for it. I always mess it up though lol

ameliatastic commented 1 year ago

oh that's lame, i'll try updating the original lol

update: yeah that worked nice