AndreVanDelft / scala

The SubScript extension to the Scala programming language
http://www.subscript-lang.org/
12 stars 1 forks source link

Support for Script result values #9

Open AndreVanDelft opened 10 years ago

AndreVanDelft commented 10 years ago

Changed the terminology: result values instead of return values

anatoliykmetyuk commented 10 years ago

The work needs to be done here:

  1. Specify the requirements for the return values - briefly, what exactly should they do
  2. Develop the implementation strategy

Can you please elaborate more precisely on the point (1) - that is, exact requirements to the return values? Then I'll try to cope up with implementation ideas.

AndreVanDelft commented 10 years ago

Requirements

Result type specification

The result type is a parameter to the Script type. E.g.,

type MyScriptType = Int => Script[Int]
script..
  a:Int = {0}

Result assignment

A script result value is assigned using the ^ postfix operator attached to either a code fragment or script/method call; alternatively one may use the variable named "$".

If there is only 1 operand in the script body (presumably a code fragment or a script/method call), then the ^ postfix operator may be left out. Maybe this feature is not wise; there is no way to say in this case that the defined script will not result anything at all.

E.g.,

script..
  a:Int = {0}     // becomes 0
  b:String = {0}; {""}^ // becomes ""
  c:String = {0}; {$=""} // becomes ""; no result type inference

The first example means to express that

Result capturing

It is possible to capture the result of a callto a script or method in a variable of the type $number and $name, using the "^" operator. If no such operator is specified then the result of tall call to the script or method with a given name (as appearing in the text) becomes available as $name. E.g.,

script..
  a = aCall; print($aCall)
  b = aCall^A; print(s"\$A=${$A}")
  c = aCall^7; print(s"\$7=${$7}")
  d = {1}^7; print($7)

Note: there should be no space between "^" and its right-hand side. A special case is when no such a right-hand side exists; the result of the call then goes into "$", i.e. the result variable for the calling script. The "$" symbol was taken because it also appears in YACC. This new special meaning of "$" clashes with the meaning of "$" in interpolated strings. Maybe it is better to drop "$" in favour of "^".

Result type inference

E.g.,

script..
  a = {0}     // inferred: Int
  b = {0}; {""}^ // inferred:  String
  c = {0}; {""} // nothing (Unit)
  d = {0}^; {""}^ // inferred: Any

(Work in progress)

Implementation

Type Script

Currently there is a type Script, which is returned by the method DSL._script. This should get a type parameter, that specifies the script result type. For scripts without a result, the type parameter should be Unit. It would be nice if Script = Script[Unit] but FTTB there are no default type parameters in Scala. We could bypass the problem by letting the compiler append "[Unit]" to "Script" in case it appears without a type parameter.

'$' usage

Variable names starting with a $ are now possible; however these must be defined by an enclosing script. Maybe the Scanner and the Parser need to be changed for this.

Storage and transfer

Scripts store their $ variables locally, as if they are local variables. The result value is transferred to the caller for capturing whenever they have success, i.e. at the same moments that ?parameters would be transferred.

anatoliykmetyuk commented 10 years ago

Wouldn't it be more natural if we just use local variables to store result values? For example:

def script..
  foo: Unit = var x: Int = bar ; {println(x)}
  bar: Int   = {0}^ & {println("returned 0")}

In case of ordinary functions, for example, it would be rather inconvenient to work with them if they automatically assigned their return types to specifically named variables:

def currentTime: Long = System.currentTimeMillis()

def foo = {
   currentTime()^1
   currentTime()^2
   println($2 - $1)
}

In is unnatural and inconvenient, user would prefer something like:

def foo = {
  val t1 = currentTime()
  val t2 = currentTime()
  println(t2 - t1)
}

That is, all the assignments should be treated uniformly. If a result of a function is assigned in a "a = f(x)" manner, then script results also should be assigned in a similar way, because otherwise it'll make the code less readable, will make learning SubScript more difficult and generally add some complexity to it.

AndreVanDelft commented 10 years ago

Very good to think about such aspects ant to discuss these before impelenting. However, I slightly disagree, for several reasons:

x^1 & y^2; print($1+$2)
var v1=0; var v2=0; (v1=x)&(v2=y); print(v1+v2)

OTOH, the "var" variant allows for specifying a type explicitly. Maybe we should support something like that: "bar^x:Int. So I really want the "^" result capturing. However, if the "var" syntax would be easy to implement (I think so) then that would be nice to have as well; may the fittest survive.

Please note I have updated my previous comment, in the section "Result capturing".

anatoliykmetyuk commented 10 years ago

Speculations

Node result values rather then script result values

At the runtime, we have no such notion as script, everything is represented by call graph and template tree. Even script calls are nodes in the graph. While convenient in high-level discussions, it's inconvenient to talk in terms of script result values during actual implementation. Instead, we should use "node result values" term in such discussions. What precisely should we understand under it? Well, in your examples even atomic normal code node could have a result value:

{0}^foo

therefore, we should assume that every node should be capable of having some result value.

Bottom line: every node can have result value.

Nodes are different

There are nodes, for which it is relatively easy to determine result value. For example, most of atomic nodes, like normal code node:

{
  println("Hello world")
  3
}

Here, we can just apply standard Scala rules to determine the value of the code block and assume this to be the result value of the normal code node. However, there are more tricky nodes, like n-ary operators:

a & b & c
a = {3}
b = {"foo", 10}
c = {12, "bar"}

Here, we can't determine result of "&" without some thinking. In general, different flavours of n-ary nodes may exist, and they may require completely different strategies to determine the result value. Therefore, the most efficient way of capturing this undefined nature of strategy may be simply to define this strategy as an abstract method in a CallGraphNodeTrait:

def result: T

and let children nodes care about the implementation.

Bottom line: result computation strategy should be left as an abstract method and needs to be implemented for different nodes individually.

Escalation of children results

The majority of n-ary nodes, however, won't require too tricky logic to evaluate their result. Most of them would in fact reuse result of their children to determine their own result. Examples include if-else, ";", "+" and others. Therefore, efficient though unobtrusive way of making this process easier should be defined. A nice way of doing this may involve capability to mark children nodes with flags - escalation flags - that would be used by their parents during their result computation to "navigate" through their children in a most primitive way. For example, here's how ";" may determine it's result value based on presumption that only one child of it is marked with "escalate" flag:

trait CallGraphNodeTrait[T] {
  ...
  def escalate: Boolean
  def result: T
}

class N_ary_op[T] {
  def result = 
    children.filter(_.hasSuccess).    // Only successful nodes
             filter(_.escalate).   // Only marked for escalation
              head.result           // Take head, or fail with an exception if there's no such

Bottom line: nodes can be marked with escalation flag for ancestor n-ary nodes to be able to navigate through them.

Implementation

CallGraphNodeTrait

def result: T

First, the fact that the node can have a result should be defined; also undefined nature of the result computation strategy should be defined. Next, the fact that this result can potentially be bound (or not - that's why we should use Option) to some local variable should be reflected. Finally, escalation flag should be defined in the most trivial and straightforward fashion.

trait CallGraphNodeTrait[T] {
  def result: T
  var resultVariable: Option[LocalVariable[T]] = None
  var escalate: Boolean = false
}

Other nodes

For other nodes, we'll have to implement the result computation strategy. For atoms this should be trivial:

var result: T

and then in CodeExecutor set this var to some concrete value once atom code computed (to the result of this atom's code execution, obviously). For other nodes this may be not so trivial. For example, in case of ";" escalation flags should be used:

def result: T = children.filter {x => x.hasSuccess && x.escalate}.head.result

In case of "&", "||" or others other strategies might be desired.

ScriptExecutor

Activation

On node activation, we should check whether its resultVariable option is defined - in this case, we assume this node's result should be bound to a variable. If it's not defined, we assume the opposite. We take this variable, get its name and define corresponding variable in the nearest n-ary ancestor:

resultVariable match {
  case Some(LocalVariable(name)) => node.n_ary_op_ancestor.initLocalVariable(name, node.pass, ???) // notice: null can't be used instead of '???', since upper bound is Any. Further thinking is needed to come up with appropriate initialization strategy.
  case None =>
}

Success

On success, we assume that the node that succeeded is already ready to present its result. So we look at whether we need it (to bound to the local variable) or not and, if we do, we compute it and we use it.

resultVariable match {
  case Some(v) => v.at(node).value = node.result
  case None =>
}

Parser

Due to escalation flag introduction, I propose to change the syntax the following way:

AndreVanDelft commented 10 years ago

Very interesting; this way we would get more and more towards a language that manipulates data on a high level...but I am not entirely convinced yet.

A nice way of doing this may involve capability to mark children nodes with flags - escalation flags - that would be used by their parents during their result computation to "navigate" through their children in a most primitive way.

What should be done with the results of child nodes that have already deactivated? How should that be implemented. As a rule of thumb: if you can define some simple rules that are easy to implement, then it is often explainable and useful.

Parser node^name^ - escalate, bind result to local variable named name Implement that like everything in the parser so far.

Do you have a use case for this? If not, I think we should not yet support this, FTTB.

I miss support for setting the result value of the current script. So why not add the following: node^^ - bind result value to the local value for the result of the enclosing script

Bottom line: every node can have result value.

I am not sure how to define this for parallel operators, but even if it would not be nicely possible, then we may define some rules that still would be worthwhile for most other node types.

Before this would be implemented though, I would like to see quite a few use cases:

footnote(??n: Int): String =      if (fnFormat==              NUMBER_DOT )             (??n ".")
                             else if (fnFormat==PARENTHESIZED_NUMBER_DASH) (footnoteRef,??n "-")
                           ; line^^
                           ; .. (line ==> {: $ += _.trim :})
anatoliykmetyuk commented 10 years ago

You're right, deactivated nodes will be out of reach. I think, we can just make a callback, onChildSuccess in the n-ary node and call it each time some child has success. Child nodes' results will be accumulated using this method. For instance, for ";" we can use something as follows:

var childrenResults: List[(Boolean, T)] = List()  // (escalate, result)
def onChildSuccess(c: CallGraphNodeTrait) = childrenResults ::= (c.escalate, c.result)
def result = childrenResults.filter {case (escalate, _) => escalate}.map {case (_, result) => result}.head

Use case for node^name^:

(if (expression) {0} else {1})^result^ ; {println("LOG: returned " + $result + " from ; node")}

node^^ - bind result value to the local value for the result of the enclosing script

I don't quite understand. Do you mean to bind result to $node?

anatoliykmetyuk commented 10 years ago

On the question of examples of Scala Workshop: can you please specify more precisely? As far as I have seen, nothing there have involved result values. Have I missed something?

AndreVanDelft commented 10 years ago

I don't quite understand. Do you mean to bind result to $node? No; to $; i.e. the result of the script that this code appears in.

The Scala Workshop paper is "Dataflow Constructs for a Language Extension Based on the Algebra of Communicating Processes". The result values are mentioned in the abstract and elaborated on page 6.

AndreVanDelft commented 10 years ago

This calculator built with parser combinators would be a good use case.

AndreVanDelft commented 10 years ago

To add result and failure values to scripts we could easily generate some additional code for scripts. E.g., for a script

test(i:Int) = print,"Hello" println,i

we currently generate code like this:

def _test(i:Int) =

  _script(this, 'test) {
    _seq({print("Hello ")}, {println(i) })
  }

And this could become:

def _test(i:Int) =
  var result: Int
  var failure: Throwable
  _script(this, 'test) {
    _seq({print("Hello ")}, {println(i) })
  }

There are two problems with this:

These may be solved by creating a class for scripts. I am thinking of

import scala.language.reflectiveCalls

abstract class Script[R](_owner: AnyRef, _name: Symbol) extends N_script {
  var result   : R = _
  var failure  : Throwable = null
}

Usage:

def test(i:Int)(_c: N_call) = new Script[Int](_owner=this, _name = `test) {
  val template = _seq({print("Hello ")}, {result=i; println(i) })
} 

_name, `_templateand_owner`` start with underscores so that these clearly belong to the enclosing script.

AndreVanDelft commented 10 years ago

In the latter example we can access result between the braces because we are subclassing. In the currently generated code we cannot do that, because the braces form a parameter to the function _script, so result is not brought into scope. But I would like not to touch the current code generator much for two reasons: it would take time, and the current solution is clear and simple.

I was wondering: can we turn _script into a macro, so that it effectively would transform itself in the new Script[] { ... } code that brings result into scope?

In that case we would only need to add the type parameters to the generated code.

AndreVanDelft commented 10 years ago

I experimented a bit with the macros but I did not manage to bring result and failure in context this way. The macro call requires that the actual parameters are well typed expressions before the macro is called. There would also be another problem with the Script class approach (without macros): this and its features will point to the current script rather than the current object, in contrast to the programmer's expectation.

It may be possible to rewrite "this" using a macro: http://meta.plasm.us/posts/2013/08/31/feeding-our-vampires/ but I think this will be quite complicated.

I will therefore add the result and failure fields in the compiled code.

anatoliykmetyuk commented 10 years ago

Why do we need to put result and failure there? Wouldn't it be more intuitive to put these variables into the Call Graph nodes classes? What is the advantage?

Also, wouldn't it be more rational to use Try[T] instead of result and failure, so that we can represent the result of computation with either Success, or Failure. Or null, if it didn't terminate at all.

AndreVanDelft commented 10 years ago

Script Result & Failure support

Features, Design and Implementation

Inside scripts several new features become available, accessible from Scala code, e.g. in code fragments and if-conditions:

The here of a code fragments will also have a private result/failure value. It may be accessed as here.$. Note that there is no big need to get easier access, since the result of a code fragment becomes just the value of its executed Scala block. The failure may be set using a call to a method here.fail(failureDescription:String). Reading it would be no use.

The new fields do not require compiler changes; they will be supported by

script has type Script[R] for some R. This means R must be known in the context. Probably all concrete node classes for the template tree and for the call graph will therefore need such a parameter, and DSL methods need to have those too. This may be the biggest challenge of the result/failure support operation.

Note: Since results are now in a Try, numeric results are not initialized to 0. This has to be done manually.

Trait ScriptResultHolder

The $ feature for results and failures is available from this new trait:

trait ScriptResultHolder[R] {var $:Try[R] = null}

Class Script

The script field is, like here, a node in the call graph. It has a special class: Script. It has a 1 to 1 relationship to its template.

case class Script[R](template: TemplateNode.Child, p: FormalParameter[_]*) extends N_script[R] with ScriptResultHolder[R] {
  def script = this
}

Trait N_code_fragment

N_atomic_action is renamed to N_code_fragment, since {!!} does not mark an atomic action. N_code_fragment gets a type parameter:

trait N_code_fragment[Node,R] extends CallGraphLeafNode with ScriptResultHolder[R] {

DSL

The _script method in DSL returns a new instance of a Script node, with an equally newly generated template:

def _script[R](owner:AnyRef, name:String, childTemplate: TemplateNode.Child, p: FormalParameter[_]*) = {
  val template = T_script(owner, "script", name, childTemplate)
  new Script[R](template, p:_*)
}

subscript.Predef

subscript.Predef allows for convenient access to the script result variable:

  def $         [R]: Try[R]     (implicit s: Script[R]) = s.$
  def $result   [R]       (implicit s: Script[R]) = s.$.asInstanceOf[Success[R]]
  def $failure  [R]       (implicit s: Script[R]) = s.$.asInstanceOf[Failure]
  def $_=       [R](v:Int, implicit s: Script[R]) = s.$=v
  def $result_= [R](v:Int, implicit s: Script[R]) = s.$=Success(v)
  def $failure_=[R](v:Int, implicit s: Script[R]) = s.$=Failure(v)

Other changes

T_call gets type parameter R.

anatoliykmetyuk commented 10 years ago

I like this idea of the script variable: it makes result determination more flexible and developer-friendly, without hard-coding any logic into the VM core classes. One can just write script.$ = Success(n) from his/her code to set the result, very nice. How does SubScript compiler know the difference between "." as a break operand and "." as an object-oriented path separator in case of script intensive usage?

(Boldface marks answers by AvD): If there is white space before the ".", or if it cannot be a path separator at all, then it is a break operand. A similar rule holds for parentheses.

Also, maybe, a better way of doing things is just to make here accessible at places where script is supposed to be accessible? script points to the Script node, here points to the current node, but from the Scala code context rather then from a script, so there will be no naming collision. An advantage of doing things this way is that this is more intuitive for the user. It is not very conveniently to remember a whole bunch of new keywords to use SubScript.

"here" is only (or very mainly) available in code fragments, script calls, if conditions, while conditions, annotations. The latter has also a "there" value. These are the only fixed "keywords"; the rest is rather flexibly defined in Predef.

Also, a "script" (not a variable, but a "script") is rather artificial notion on the graph level: we declare a certain region of a graph (I believe, all the nodes located under a Script node) to be a "script", give it some special properties. But a reasonable question arises: if this region has "scriptic" properties (I believe, the only such property is access to the nearest Script ancestor on demand and setting its result), why some other bunch of nodes (or an individual node) can't have these properties (access its direct parent (not Script) and set its result)? If Script can have result values, why no other node can have such? If done this way, in some time we can start thinking about "anonymous scripts" to make ";" or "&" have a result value (for some reason), because normally it can't have such because it doesn't have it's own Script as a direct ancestor. Adding new concepts without a reasonable need is not a good thing.

Script lambda's become scripts, so they have their own result values. If there would appear a use to give a region an option for its own result value, then we might create a lambda for it by enclosing it in brackets "[".... "]".

In my opinion, a good way of doing things would be to mix the ResultHolder trait to the CallGraphNode trait, so that every node can have a result, not only scripts - this way, on the graph level there are no "privileged" bunches of nodes - "scripts" - and every node has same result-capable properties.

FTTB we must get something simple working soon; when a first implementation is ready we can experiment with use cases and see if we want something more general like what you describe here.

And, I don't think it is a good thing to expose the graph and all other internal machinery to the end user. Graph is just an implementation of the idea, it is not a good thing to expose it to the end-user API.

This is not an urgent issue. We can make features package-private later.

Can you please clarify on the need of the Script class? In your architecture, the only difference of it from the N_call class is that the N_call node knows only the name of a function that will yield its template, as opposed to the Script class which accepts its template on construction time. But I can't see how will it help in case of script result values. Though, the Script class has advantage compared to the N_call class. A class that has a ready template is always more intuitive then a class that only has a symbol of a template. Maybe it is a good idea to replace N_call class with Script class?

N_call is a caller, not a callee, so the result value does not belong there. Besides the solution should also work for communicating scripts: a,b={}. There are multiple callers and a single callee. The latter should carry the unique result. Maybe the current design is far from perfect, but it is important to get something working soon with the intention to enhance later. Otherwise we will suffer analysis paralysis.

Also, I suppose, $ should be Try[R], not R, because you say at the very beginning that it is a Try.

Yes, thanks, I saw that too, when I started to code.

anatoliykmetyuk commented 10 years ago

I'm finally getting convinced. So we make a special node, Script, that will be responsible for result values of a certain graph region. We manipulate the result of this Script node from its script body. Yes, sounds nice. I think, we can do that and see how it behaves in various use cases.

However, an issue of type inference arises. If the programmer decides by himself what the result would be, then we have to teach the compiler to find the script.$ = Success(foo) constructs and infer the script type from foo's type.

Reply: That will not be an issue. We do not do that kind of interference; only for implicit and explicit ^ occurrences.

AndreVanDelft commented 10 years ago

Next Try: Script Result & Failure support

My previous big posting is already outdated; during implementation some things came up that required changes. The main change is that the script value in scripts is now implemented by supplying to the DSL._script method a parameter childTemplateAt: Script[R] => TemplateNode.child. The actual parameter there (generated by the SubScript compiler) has this script name; it will get the value of a new Script instance, and then produces a child template for that Script. This mechanism is partly comparable with the way here and there are brought into scope in code fragments, annotations etc, but maybe even more complicated.

The 'old' idea of my previous posting, that gave class Script a script value member, does not work, since accessing it using Predef features from the varous nodes would imply that all these nodes would need an extra type parameter, for the Script's result type.

Features, Design and Implementation

Inside scripts several new features become available, accessible from Scala code, e.g. in code fragments and if-conditions:

The here of a code fragments will also have a private result/failure value. It may be accessed as here.$. Note that there is no big need to get easier access to that value, since the result of a code fragment becomes just the value of its executed Scala block. The failure of here may be set using a call to a method here.fail(failureDescription:String). Reading it would be quite useless.

The new fields require compiler changes; apart from the already discussed script parameter, all node types that can produce values of various types need type parameters for those: code fragments, script calls, maybe later also if-else-expressions and do-expressions. Making the compiler provide appropriate type parameters is tedious, so FTTB we may implement this just rudimentarily: code fragments get type Any; for script calls we can hopefully do better.

Also there will be support by

script has type Script[R] for some R. For the time being, it is the type of the script if that has been explicitly been provided in the declaration; else it is just Any. Later we can hopefully infer the result type from ^ result specifiers.

Note: Since results are now encapsulated in a Try, numeric script results are not initialized to 0. This has to be done manually.

Trait ScriptResultHolder

The $ feature for results and failures is available from this new trait:

trait ScriptResultHolder[R] {var $:Try[R] = null}

Class Script

Instances of class Script have a 1-to-1 relation with their respective templates.

case class Script[R](var template: T_script, p: FormalParameter[_]*)
  extends CallGraphTreeNode with ScriptResultHolder[R]
  {type T = T_script}

Traits T_code_fragment N_code_fragment

T_atomic_action and N_atomic_action are renamed to T_code_fragment`` andN_code_fragment, since{!!}``` does not mark an atomic action. N_code_fragment gets a type parameter:

trait TemplateCodeHolder[R,N] extends TemplateNode {val code: N => R}

trait T_code_fragment[R,N<:N_code_fragment[R]] extends T_0_ary with TemplateCodeHolder[R,N]

trait N_code_fragment[R] extends CallGraphLeafNode with ScriptResultHolder[R] {
  type T <: T_code_fragment[R,_]
 ...
}

DSL

The _script method in DSL returns a new instance of a Script node. This node is brought under the name of script into the scope of its template code. For this purpose the DSL._script method accepts a parameter childTemplateAt: Script[S]=>TemplateNode.Child.

First a preliminary template is created for the Script without the child template yet. Then the Script is created using that template. Then the child template is created using the passed childTemplateAt method and the created Script. Then this child template is connected to the script template.

  def _script[S](owner:AnyRef, name:Symbol, p: FormalParameter[_]*)(childTemplateAt: Script[S]=>TemplateNode.Child): Script[S] = {
    val template = T_script(owner, "script", name, child0=null)
    val result = new Script[S](template, p:_*)
    val childTemplate = childTemplateAt(result)
    template.setChild(childTemplate)
    result
  }

subscript.Predef

subscript.Predef allows for convenient access to the script result variable:

  def $         [R]               (implicit s: Script[R]): Try[R]    = s.$
  def $result   [R]               (implicit s: Script[R]): R         = s.$.asInstanceOf[Success[R]].value
  def $failure  [R]               (implicit s: Script[R]): Throwable = {val f=s.$.asInstanceOf[Failure[R]]
                                                                        if(f==null)null else f.exception}
  def $_=       [R] (v: Try[R]   )(implicit s: Script[R])            = s.$=v
  def $result_= [R] (v: R        )(implicit s: Script[R])            = s.$=Success(v)
  def $failure_=[R] (v: Throwable)(implicit s: Script[R])            = s.$=Failure(v)  

Other changes

Many classes for template nodes, call graph nodes, script executors and code executors get a type parameter R for script results and node results.

AndreVanDelft commented 10 years ago

To complete the previous comment, a typical call to DSL._script is

def _times(n:Int) = {_script(this,'times) {(script:Script[Unit]) => _while{implicit here=>pass<n}}}

this would be equivalent to

def script times(n:Int) = while(here.pass<n)