PistonDevelopers / piston

A modular game engine written in Rust
https://www.piston.rs
MIT License
4.64k stars 235 forks source link

Idea about a new scripting language #672

Closed bvssvni closed 10 years ago

bvssvni commented 10 years ago

I have not thought of syntax of operator precedence, but to avoid cluttering I'll assume there is some way to solve this. The syntax shown here is not final, this is just for demonstration of the ideas.

Type system

A type system does 2 things:

  1. It looks up functions
  2. It reports unreachable or ambiguous function search

In this language, everything is run time checked, but the syntax is designed to be understandable for optimization and theorem proving. The language is unsafe at low level, because it is intended as a scripting language where the back-end controls the actual operations. The semantics is very different from other languages, because all unary and binary operators are sugar for functions search. The function search is written in the scripting language, giving the programmer power to write arbitrary type systems.

For example:

a + b

This is sugar for:

+(a, b)(a, b)

The + operator is just a normal function that returns a function:

fn +(a, b) -> {
    if a is i32 && b is i32 { intrinsics::add_i32_i32 }
    else if a is i64 && b is i64 { intrinsics::add_i64_i64 }
    else { error format("Could not find method for {} + {}", typedesc a, typedesc b) }
}

All concrete functions are unsafe, but since type checking is performed before calling these functions, we can build safe abstractions around the intrinsics.

You might notice that error and typedesc has no paranthesis. Function calls without parenthesis means it first computes a function and then apply the arguments to it. This is because in this script language, there is no distinction between function names and operators. For example - a is the same as -(a)(a) where - is declared fn -(a) -> { ... }. This makes it easy to extend the syntax and build new features around intrinsics.

Module system

The module system will be similar to Rust with use and mod.

For example, if you want to extend the + operator for new types, you can do

fn +(a, b) -> {
    if a is [] i32 && b is i32 { intrinsics::add_vec_i32_i32 }
    else if a is [] i32 && b is [] i32 
        && a.len() == b.len() { instrinsics::add_vec_i32_vec_i32 }
    else { math::+(a, b) }
}

The new + operator belongs to the current module and can be imported with use mymodule::{ + }; elsewhere. Notice the check that the length is the same. If the theorem prover is capable of proving the variables are of the same size, then it will optimize out this check.

Functions

The first argument in a function can be used similar to the self keyword in Rust.

fn foo(a, b) { ... }

a.foo(b); // same as foo(a, b);

The function foo will accept values of any type. What if you want to restrict the type? Well, for function arguments, you can use any expression of a single variable. This means we can define a function for the : operator.

fn :(a, b) -> {
    if a is typeof b { id_left }
    else { error format("Expected {}", b.name()) }
}

fn id_left(a, b) -> { a } // Just return the left argument.

Now we can write

fn foo(a : i32, b : f64) { ... }

Because you can use any expression of a single variable, you can emulate sum types:

fn eat(food a) { ... }`

fn food(a) -> {
    if a is Cake { intrinsics::id }
    else if b is Bread { intrinsics::id }
    else { error "I can't eat that!" }
}

Rust integration

The stack consists of two stacks, one for data Vec<u8> and another one for type information and "pointers". Pointers tell where values are stored in the data stack and are relative to the beginning. Before calling an intrinsic function, a fat pointer is stored on the Rust stack. Converting to fat pointers is required to support string slices. The idea is to make it easy to extend the language with new Rust types and features.

An intrinsic function can be implemented like this:

pub unsafe fn add_u8_u8(st: &mut Stack, a: Arg, b: Arg) {
    st.push_u8(a.to::<u8>() + b.to::<u8>());
}

When a value is popped off the stack, the length of the data stack is set unsafely to the position before the pointer, meaning you can't push until you are done computing, because you will then overwrite the memory of the arguments. This is done for performance reasons.

Here is an overview:

  1. A pointer is popped off the pointer stack.
  2. A fat pointer is computed that represents a real Rust pointer
  3. The length of the data stack is set to the pointer offset
  4. Repeat 1-3 for all arguments.
  5. The intrinsic function is called
  6. The intrinsic function computes the value on the Rust stack
  7. The value is copied from the Rust stack to the data stack.
  8. A new pointer is pushed onto the pointer stack.

While this means the unoptimized version of the scripting language is slow compared to Rust, it is easily extensible with new Rust types and functions. These functions will then appear unsafe in the language, but you can write safe abstractions around them by using reflection.

The type checking we don't want to deal with through script can be written in Rust. This requires run time reflection, but since this runs at "compile time" for the scripting language it is sufficient.

dobkeratops commented 10 years ago

I see you want to go the traditional Lua route which is very wise.

I don't know how long I'll continue with this but, inspired by jonathan blow I've got going on my own language. Its' running in basic form via LLVM at the minute. I realise rolling a new language is 99.9% probably a futile exercise but I have some itches to scratch.

its' got some similarity to ideas above; it copies rusts' basic syntax but diverges:

you can write fn foo(a,b) and thats' equivalent to fn foo<A,B>(a,b) .. something 'typeless' is just an unbounded generic. it uses C++ style adhoc overloading, so as soon as you add types, it'll chose the best function for any call based on those. This is close to how 'julia' does things.

I have several goals:-

[1] direct 1:1 mapping of C++ functions via overloading - I like Rusts' clean expression-based syntax, but don't want to ditch C++. I want something which could eventually just be a different syntax for a C++ compiler , see the "SPECS" proposal... just with additional features like safe functions, ADTs and immutable default, better type-inference. 1:1 translation of some subset should be possible both ways. when C++ eventually gets modules, i think it should be theoretically possible to have a seamless mixed language project much like JVM can use java/scala/clojure, or in the apple-verse they have designed Swift to drop in directly alongside objC

[2] try out something that equally suits being compiled or interpreted. The language should have convenient JSON like inline literal data, its' a nice property of scripting languages

[3] try out an idea for the expression problem where open adhoc function overloads can be collected into ADTs or Vtables equally, so you don't decide upfront if you 'sort by method' or 'sort by type'. e.g. if you implement foo(a:A){}, foo(a:B){}, bar(a:A){}, bar (a:B){} , you can just declare an interface IFooBar {fn foo(self),fn bar(self)} and it will roll a vtable holding the appropriate foo,bar's for A,B; and vica versa (duck typed, if the methods exist the interface is good to go;), if you declare a tagged union type AorB = A|B, it could automatically roll functions foo(x:A|B), bar(x:A|B) which will dispatch to the same foo/bar implementations via switch statements. None of that works yet :) i just have it about the level of C but with adhoc overloading & instantiating '100% generics' (typeless functions with no management of type-params).

seems like an obvious idea so perhaps i'll run into impracticalities or the complexity will explode and I'll give up

[4] autodelegation for composing types -maybe just for tuples containing structs - I think this will be good for 'data-oriented design' where you need to change the same code between different data layouts. You might start out with one struct, but as it grows, you find you need to cut it into 2 parts... so imagine if you can just tuple references to the 2 parts together and the same function bodies still work e.g. struct Foo{a,b,c,d} fn foo(f:Foo){.. uses a,b,c,d} <--refactor--> fn foo(f:(Foo1*,Foo2*)){... same fn body uses a,b,c,d...} struct Foo1{a,b} Foo2{b,c}

[5] no macro system, just a few more sugars, a direct form of compile time reflection , default args etc