dexscript / design

DexScript - for Better Developer EXperience
http://dexscript.com
Apache License 2.0
4 stars 0 forks source link

DexScript Introduction #20

Open taowen opened 6 years ago

taowen commented 6 years ago

Local Variable

let a = 1
var b = 2 // same as b := 2
var c tuple{int32, float64} // initialized like var c = tuple{int32(0), float64(0.0)}

Value Type

value is allocated on the stack, and passed by copy

tuple/array/struct will allocate its elements continously. You can think tuple/array/struct as three generic types, with {} arguments to construct a concrete type from the generic type definition.

let array_val1 = array{int64, 3}{} // empty 3 elements 
array my_array{int64, 3} // define my_array type
let array_val2 = my_array{1,2,3}
let array_val3 = array{32, 44}// 2 elements array with valure 32 and 44
array_val3[0] = array_val3[1] // get/set by []
let tuple_val1 = tuple{int32, int32} {1, 2} // two elements tuple
tuple two_ints{int32, int32} // define tuple type
let tuple_val2 = two_ints{1, 2} // use defined tuple type
let tuple_val3 = tuple{1, 2} // same as above
let tuple_val4 = tuple{1.1, 2.2, 3} // three elements tuple
tuple_val4[0] = tuple_val4[1] // get/set by []
let struct_val1 = struct {
  field1: int32,
  field2: int32
} {
  field1: 100,
  field2: 200
}
struct two_ints{field1: int32, field2: int32}
let struct_val2 = two_ints{1, 2}
let struct_val3 = struct{
  field1: 100,
  field2: 200
}
let struct_val4 = struct{
  field1: 1.1,
  field2: 2.2,
  field3: 3
}
struct_val4.field1 = struct_val4.field2 // get/set by field name

alias can be created for any type

alias byte uint8 // byte is not uint8, but can be converted from uint8

string literal is a oject of type string_literal that wraps value array{byte, n}, and defined like 'my string literal' multi-line literal defined like

`
multi
line
string
literal
`

Because string is object, so it is not a value type. Assignment will share the string instead of copy it. We will introduce object later.

Any Type

any is a special type. It is like interface{} in golang, but represent the data layout like zval in PHP7.

func add(a any, b any) any {
  return a + b
}

Operation can be applied to any without static type checking. Type conversion just like golang

func add(a any, b any) int64 {
  c := a + b
  return c.(int64)
}

All dexscript code can be executed by interpreter, essentially treating all variables of type any. Having any allows certain part of code written in dynamic language style.

Control Structure

same as golang

String Template

use "hello {{ world }}" to denote a string template, the world is a variable in scope

let str = "hello {{ world }}" // evaluate at runtime
"print(str)" // evaluate at compile time

string template can be used to do meta-programming to generate function body

Function

func add(a int64, b int64) int64 {
  return a + b
}

a, b will be copied to stack of add, and it can not be modified

function can be called by positional argument add(1, 2) or named argument add(a:1, b:2)

func sum(input array{int64}) int64 {
  var sum int64
  for _, elem := range input {
    sum += elem
  }
  return sum
}

array{int64} is constraint on input type. it must be array, array must have int64 as element.

Function as Value

function is a value. It can be passed around as argument, saved as local variable.

func reduce(input array{int64}, op func(int64, int64) int64) int64 {
  var reduced int64
  for _, elem := range input {
    reduced = op(reduced, elem)
  }
  return reduced
}
func sum(input array{int64}) int64 {
  return reduce(input, add) // pass add function as value
}

function value can also be defined inline

func sum(input array{int64}) int64 {
  return reduce(input, func (a int64, b int64) int64 {
    return a + b
  })
}

Error Handling

error has two kinds:

defer will be executed no matter what happened

func do_something() {
  var f = file{name: 'some.tmp'}
  defer { f.remove() }
  // do something
}

handle will be executed when throw or panic happened

func do_something() {
  var f = file{name: 'some.tmp'}
  handle(err any) {
    print(err) // can get the error value
    f.remove()
  }
  // do something
}

the error will be propagated to function caller by default, use recover to stop propagation.

func do_something() {
  var f = file{name: 'some.tmp'}
  handle(err any) {
    recover() // will stop propagation
  }
  // do something
}

can handle only specific type of err

func do_something() {
  var f = file{name: 'some.tmp'}
  handle(err string) { // only if error value if of type string
    print(err)
  }
  // do something
}

to throw a error value

func do_something() { throw('shit happens') }

to panic a error value

func do_something() { panic('shit happens') }

from defer and handle point of view, there is no difference between throw and panic. However, if the error is thrown, the caller can check

func do_something() {
  f, err := check open_something() // check will capture the thrown error as return value
  if err != nil { 
    print(err)
    return 
  }
}
func open_something() file {
  // open the file
}

the actual signature of open_something is func open_something() tuple{file, any}. If do not want to pay the cost of returning tuple, it can be disabled.

@nothrow
func add(a int64, b int64) int64 { return a + b }

If nothrow function called another function, and the function throwed a error value out. It will be turned into panic.

Call Stack Context

TBD

Separtion of Concerns (AOP)

logging/statistics is done off-site via AOP. Error-handling at the site only need to manage resource and handle business logic.

Coroutine and Object

Coroutine is a function that can be supsended and resumed. Coroutine is also a object. Here is a coroutine to geneate fibonacci sequence:

object fibonacci() {
a, b := 1, 1
await {
  @mut
  message next() int64 {
    reply a
    a, b = b, a + b
  }
}}
func print_fib() {
  fib := fibonacci{} // new object by {}
  print(fib.next())
  print(fib.next())
  print(fib.next())
  print(fib.next())
}

it will print

1
1
2
3

the fibonacci implemented iterator interface, so it can be used in range

func print_fib() {
  fib := fibonacci()
  i := 0
  for val := range fib {
    print(val)
    if i +=1; i == 5 {
      break
    }
  }
}

we can hard code the fib as well

object fibonacci() {
await { message next() int64 {
    reply 1; break
}}
await { message next() int64 {
    reply 1; break
}}
await { message next() int64 {
    reply 2; break
}}
await { message next() int64 {
    reply 3; break
}}}

the usage is all the same

func print_fib() {
  fib := fibonacci{}
  print(fib.next())
  print(fib.next())
  print(fib.next())
  print(fib.next())
}

It is essentially a state machine. Everytime you send the message next(), the state will change. What if we call next(0 the 5th time? The state machine has ended an no longer accept message next(), it will panic

We can label the state of the state machine:

object fibonacci() {
state_a:
await { message next() int64 {
    reply 1; break
}}
state_b:
await { message next() int64 {
    reply 1; break
}}
state_c:
await { message next() int64 {
    reply 2; break
}}
state_d:
await { message next() int64 {
    reply 3; break
}}}

the state of the state machine can be inspected using switch case:

func inspect_fib(fib fibonacci) {
  swtich fib.(state) {
    case state_a: 
      print('a')
    case state_b:
      print('b')
    case state_c:
      print('c')
    case state_d:
      print('d')
    case ended:
      print('ended')
  }
}

Serializable Coroutine

the state machine can be serialized if marked as @serializable

@serializable
object fibonacci() {
state_a:
await { message next() int64 {
    reply 1; break
}}
state_b:
await { message next() int64 {
    reply 1; break
}}
state_c:
await { message next() int64 {
    reply 2; break
}}
state_d:
await { message next() int64 {
    reply 3; break
}}}
func print_fib() {
  fib := fibonacci{}
  print(fib.next())
  print(fib.next())
  let fib_json json = convert(fib) // serialize to json
  fib2 := convert(fib_json) // de-serialize back
  print(fib2.next())
  print(fib2.next())
  print(fib.next())
  print(fib.next())
}

this will print

1
1
2
3
2
3

notice the fib and fib2 move independently.

Object Composition

object can not inherit from other object. Instead, we can compose them together.

object content_filter() {
await() {
  message filter_by_content(content string) bool {
    // impl
  }
}}
object url_filter(cfilter content_filter) {
await() {
  message filter_by_url(url string) bool {
    // impl
  }
  proxy cfilter // filter_by_content will be proxyed to content_filter
}}

We can compose multiple object together

object big_filter(cfilter content_filter, ufilter url_filter) {
await() {
  proxy cfilter // filter_by_content will be proxyed to content_filter
  proxy ufilter // filter_by_url will be proxyed to url_filter
}}

Actor

actor works like goroutine in go programming language. It is a object with its own executor.

actor add(a int64, b int64) int64 {
  return a+b
}

this defined a actor that can execute independently

actor main() {
  result := async add(1, 2)
  print(await result)
}

Notice we use async to create new actor, and use await to get its calculation result back. Actors communicate via message

actor add(a int64, b int64) int64 {
  sum := a+b
await {
  @mut message add_another(c int64) int64 {
    reply sum
    sum += c
  }
  @mut messgae done() {
    return sum
  }
}}

actor main() {
  result := async add(1, 2)
  print(result->add_another(3)) // use -> to send message across actor boundary
  print(result->add_another(4))
  result->done()
  print(await result)
}

this will print

3
6
10

Although actor is written like object, we deliberately use -> to send message. Because the message must be copied to across the actor boundary, -> make the copying explicit.

Generics

func swap(@mut input tuple{T1, T2}, T1 type, T2 type) tuple{T2, T1} {
  input[0], input[1] = T1(input[1]), T2(input[0])
  return tuple{T2(input[0]), T1(input[1])}
}

Return type tuple{T2, T1} is inferred from input type. If the type information is incomplete, the return type will be inferred from return value assignment.

@mut annotation make the argument mutable.

func convert(input T1, T1 type, T2 type) T2 {
  // ...
}
func use_convert() {
  var converted int64
  converted = convert(float64(1.21)) // return type inferred from the assignment
  this_will_not_work := convert(float64(1.21)) // do not know return type this way
}

Overload by Contract

We can share same implementation for different types using generics. We can also choose different implementation for different types using overload.

func add(a int, b int) {
  // impl
}
func add(a int, b string) {
  // impl
}

Overload can dispatch on runtime value by extra contract instead of on compile-time type

func some_behavior(p product, u user) 
  require is_localized_product(p, u) {
  // impl
}
func some_behavior(p product, u user) 
  require !is_localized_product(p, u) {
  // impl
}

If require only references type argument, it can be evaluated at compile-time as well. Contract of pre-condition is specified by require, post-condition is specified by ensure

func abs(val int64) (ret int64) 
  ensure ret >= 0 {
  // impl
}

ensure can also be used to specify invariant of object.

object account() {
  var amount float64
  ensure amount >= 0 
  // it is more powerful than comment
  // after handling every message, invariant will be ensured
await {
  // messages
}}

Interface

All type contract can be specified using require. But some contract has profound meaning that should be named, we call it interface. It is just a reusable type constraint checker.

interface iterator(T type) {
  next() T
}

We can also add require to interface

func is_number(T type) bool {
  // impl
}
interface number_iterator(T type) 
  require is_number(T) {
  next() T
}

Now, we can use interface iterator as type

func print_iterator(iter iterator) {
  for elem := range iter {
    print(elem)
  }
}

this longer signature is equivalent, but convey the meaning more clearly

func print_iterator(iter T, T type{iterator})

T is a type argument, requires implementing iterator interface.

OOP

Dexscript does not support OOP, as it does not support inheritance. We favor composition over inheritance. But the properties of OOP can be obtained through other means

Essentially, dexscript programming model is function based. Dispatch different function based on the contract. If dispatch happen statically, it is like C++ template. If dispatch happen dynamically, it is like Java polymorphism.

Function Call Syntax Sugar

But function overloading does not look as nice

func is_empty(self list) bool {
  return self.count() == 0
}

is_empty(trim_null(my_list)) // this does not compose well

The syntax looks better if it is my_list.trim_null().is_empty(). So uniform function call syntax, or C# extension method is supported.

operator overloading is another function call sugar

func operator_multiply(m1 matrix, m2 matrix) matrix {
  // impl
}

var m1, m2 matrix
m1 * m2 // will call function operator_multiply

Summary of Function/Object/Actor

object is a resumable func. actor is combination of func and object with its own executor. Syntax are very similar, with two tiny difference:

The rational to make the distinction is to make the call site explicit about the caller, because the behavior of the three are remarkablely different. However, we still keep the argument passing, type inference, generics, overloading semantics all the same. In some sense, these operations are just different form of invocations

Reference Value

Value type will always be copied. If we want to share value between funciton calls, we can wrap the value inside a object.

object two_int(a int64, b int64) {
  var value tuple{int64, int64} = tuple{a, b} // same as value := tuple{a, b}
await {
  message get_a() int64 { reply value[0] }
  @mut message set_a(v int64) { value[0] = v }
  message get_b() int64 { reply value[1] }
  @mut message set_b(v int64) { value[1] = v }
}}

the tuple{int64, int64} wrapped inside two_int can be shared between function calls

func add_them(@mut them two_int) {
  them.set_a(them.get_a() + them.get_b())
}

func try_add_them() {
  v := two_int{1,1}
  add_them(v)
  add_them(v)
  print(v.get_a()) // will print 3
}

Manage Resource (RAII)

object can be used to wrap value. It can also be used to wrap expensive resource.

object file(filename string) {
  handle := syscall_open_file(filename)
  defer syscall_close_file(handle)
await {
  // operations ...
}}

When we instantiate the file object, it opens the file. If the object goes out of scope, the file will be closed.

func print_abc_txt() {
  f := file{'abc.txt'}
  print(f.read_all())
}

The file will be automatically closed before function returns. defer works like destructor of object in c++. This pattern is called RAII.

Lifetime

RAII is a simplified view of lifetime. Not all objects have simple lifetime. There are two cases we need to avoid:

given we have a object

@serializable
object two_int(a int64, b int64) {
  var value tuple{int64, int64} = tuple{a, b} // same as value := tuple{a, b}
await {
  message get_a() int64 { reply value[0] }
  @mut message set_a(v int64) { value[0] = v }
  message get_b() int64 { reply value[1] }
  @mut message set_b(v int64) { value[1] = v }
}}

if we send the object across actor boundary, the object will be copied (just like serialize and de-serialize)

actor separate_worker() {
await {
  message add(@mut obj two_int) {
    obj.set_a(obj.get_a() + obj.get_b())  // will not modify the obj on main actor
  }
}}
actor main() {
  obj := two_int{1, 2}
  worker := async separate_worker()
  worker->add(obj) // -> will copy arguments by default, obj must be @serializable
  worker->add(obj)
}

we can avoid the copy by move

actor main() {
  obj := two_int{1, 2}
  worker := async separate_worker()
  worker->add(move obj) // the obj is moved to another actor
  print(obj) // obj is null here
}

within same actor, we need to avoid object leaked into bigger scope.

func try_mess_up_lifetime() {
  parent_ref := construct_and_return()
  print(parent_ref.get_a()) // child is deleted already
}

func construct_and_return() two_int {
    child_ref := two_int{1, 2}
    return child_ref
    // delete parent will happen here, with its children
}

The problem is two_int{1, 2} is allocated in the scope of construct_and_return. How to prevent this from happen? We add owner part of the type. two_int is a generic type, it will be instantiated to concrete type at the call site.

func try_mess_up_lifetime() {
  parent_ref := construct_and_return() // call-site instantiate generic type to concrete type
  print(parent_ref.get_a()) 
}

func construct_and_return() two_int { // two_int{owner_type:try_mess_up_lifetime}
    child_ref := two_int{1, 2} // owner default to itself
    // same as two_int{1, 2, owner: this}
    return child_ref // it is actual type two_int{owner_type:construct_and_return}, type is incompatible
}

By comparing two_int{owner_type:try_mess_up_lifetime} with two_int{owner_type:construct_and_return}, the compiler will complain, and prevent disaster from happening. One way to fix compilation, is to specify alternate owner:

func try_mess_up_lifetime() {
  parent_ref := construct_and_return()
  print(parent_ref.get_a()) 
}

func construct_and_return() two_int { // two_int{owner_type:try_mess_up_lifetime}
    child_ref := two_int{1, 2, owner: owner} // owner changed to try_mess_up_lifetime
    // child_ref is just a reference on the object allocated in parent scope
    return child_ref // it is actual type two_int{owner_type:try_mess_up_lifetime}, type is compatible
}

By specifying owner, we changed who allocate the local variable of construct_and_return. The allocator is essential an object pool. Instead of managing all objects in a huge pool, and run expensive GC. Dexscript divides the heap into small hierarchical pools.

image

The red line means ownership of objects. Each object owns other objects. The blue line means reference. Every variable is a reference. If every object is directly owned by the heap root node, it is traditional heap management used by java and golang. This model is a hybrid of global GC model and c++ scope based RAII model. The root of the object pool tree is the actor. Between actors, they do not share object.

Reference count is tracked for every object. When a reference goes out of scope, the count minus one. When scope ends, all parent object pools will prune the dead object that no ones references. If there is cycle between the objects, the object pool will run GC before it allocates new memory from parent pool. When a object pool goes out of scope, all objects owned by it will be deleted.

There is no way to actual change owner of object. To give object to another owner, we can copy

func construct_and_return() two_int {
    child_ref := two_int{1, 2}
    return copy child_ref // the new copy will be owned by its caller
}

Or there is a cheaper way to copy. It decrement the reference count, if there is no one referencing it, we consider it as copied, and reuse it for another owner:

func construct_and_return() two_int {
    child_ref := two_int{1, 2}
    // decrement reference counter
    // if reference count is 0
    // reuse this object, otherwise still copy it
    // move is just a short cut of this sequence
    return move child_ref 
}

Box and Reference

box is a holder of single value. reference is a object. It points to a box to hold it alive.

func some_action() {
  arr := box{array{1, 2, 3}}
  var ref reference{int64} = &arr[1] // points to the value 2
  print(ref.get()) // print 2
}

weak_reference does not hold strong reference to keep box alive

func some_action() {
  arr := box{array{1, 2, 3}}
  var ref weak_reference{int64} = &arr[1] // points to the value 2
  print(ref.get()) // print 2
  print(ref.expired()) // print false
}

Span and Vector and String

func some_action() {
  arr := box{array{byte}{1, 2, 3}}
  var v1 vector{byte} := vector{arr[1:2]} // view of {2, 3}
  v1->push(4) // will be {2, 3, 4}
  print_str(v1) // vector as string
  var v2 span{byte} := arr[1:2] // view of {2,3}
  print_str(v2) // span as string
  print_str('hello world') // string_literal as string
}

func print_str(str string) {
  print(str)
}

SPMD

Using owner we separate the memory into hierarchical regions. CUDA has __global__ and __shared__ memory, which can be represent as

var shared_vec vector{int32, owner:__shared__}
var global_vec vector{int32, owner:__global__}

And the kernel is just a actor.

__global__ void parallel_shared_reduce_kernel(float *d_out, float* d_in){
    int myID = threadIdx.x + blockIdx.x * blockDim.x;
    int tid = threadIdx.x;
    extern __shared__ float sdata[];
    sdata[tid] = d_in[myID];
    __syncthreads();
    //divide threads into two parts according to threadID, and add the right part to the left one, 
    //lead to reducing half elements, called an iteration; iterate until left only one element
    for(unsigned int s = blockDim.x / 2 ; s>0; s>>=1){
        if(tid<s){
            sdata[tid] += sdata[tid + s];
        }
        __syncthreads(); //ensure all adds at one iteration are done
    }
    if (tid == 0){
        d_out[blockIdx.x] = sdata[myId];
    }
}

can be translated to

actor parallel_shared_reduce_kernel(d_out span{float64}, d_int span{float64}, 
  tid thread_idx_x, bid block_idx_x, bdim block_dim_x, 
  sdata span{float64, owner:__shared__}) {
  let my_id = tid + bid * bdim
  sdata[tid] = d_int[my_id]
  __syncthreads();
  //divide threads into two parts according to threadID, and add the right part to the left one, 
  //lead to reducing half elements, called an iteration; iterate until left only one element
  for s := blockDim.x / 2 ; s>0; s>>=1 {
      if(tid<s){
          sdata[tid] += sdata[tid + s];
      }
      __syncthreads(); //ensure all adds at one iteration are done
  }
  if tid == 0 {
      d_out[blockIdx.x] = sdata[myId];
  }
} 

Launch the kernel by async parallel_shared_reduce_kernel<<<100, 512>>>(d_out, d_in).

LINQ

LINQ is a nice to have.

evens := from num in numbers where num % 2 == 0 select num // lazy list

Package and Import

just like golang, the imported symbol does not need prefix with the package name

Dependency Management

just like golang, use semantic versioning

Compilation Speed

The language depends heavily on type calculation and meta-programming. The compilation will be slow because of the kind of things can be done in compile time. The strategy to avoid this head

Channel

the line of execution has three form

we have to convert actor to a continuation

channel network_read_chan(fd int) array{byte}

actor some_worker(network_read network_read_chan) {
  // ...
  bytes := network_read(fd)  // put myself into this channel, and wait to be waken up
  // ...
}

some_worker can hang itself up, and morph to continuation form, and put it into the channel.

actor ioloop(the_chan network_read_chan) {
  for {
     // fd, cont := <- the the_chan
     for fd, cont := range the_chan {
        // continuation is a function, func(bytes array{byte})
        cont(bytes) 
     }
  }
}

the continuation can be retrieved from channel, and register as callback for later execution.

select {
case bytes := network_read(fd): 
  // xxx
case sleep(10s):
  // xxx
}

select can be used to wait on multiple channel, whatever comes first will wake this up

actor some_worker {
  msg_name, msg_args, cont := <- inbox
}

every actor will have a implicit inbox. the await is just take continuation from this inbox