AndreVanDelft / scala

The SubScript extension to the Scala programming language
http://www.subscript-lang.org/
12 stars 1 forks source link

Scala Days Submission #58

Open AndreVanDelft opened 9 years ago

AndreVanDelft commented 9 years ago

SubScript

The power of Math for concurrent and reactive systems

There is an ever increasing need for concurrent and reactive applications. Programming languages and framworks get therefore more and more support for this, but the result looks more like an explosion of engineering solutions rather than a coherent approach that applies available theory. In particular fundamental idioms for non-determinism and parallelism have been neglected. We aim to improve the situation using SubScript, an extension to Scala based on a mathematical theory named the Algebra of Communicating Processes.

History

Traditionally most programming langues have elementary support for sequential composition, often using syntax elements such as semicolons and line breaks. In comparison the support for choice and parallelism, which are conceptually similar, is apparently much on a low level. Existing formalisms that offer elementary support for choice and parallelism, have been neglected in most programming language research. Since the early days of computing there have been formalisms with high level support for non-deterministic choice. In the 1950s, Stephen Kleene researched the behaviour description of nerve nets. To describe their event handling he invented regular expressions. About the same time linguist Noam Chomsky started working on formal grammars. In 1960 Backus Naur Form (BNF) was born, a grammar notation for computer languages. In that year also the first compiler-compiler was created, later followed by many others such as YACC. In the seventies the term Path Expression was coined for specifications with concise constructs such as sequential and concurrent compositions, to describe process synchronisation. This inspired Jan van den Bos around 1980 to build the Input Tool Model: an extension to Pascal and Modula-2 with a layer of regular expressions for the specification of input patterns in user machine interaction. By that time three algebraic process specification formalisms emerged: CSP, CCS, ACP. Unlike most earlier formalisms these allowed for mixing of input and output actions; they also offered a more complete set of compositions: sequence, non-deterministic choice, parallelism, iterations and others. This largely widened the application area. In the end of the 1980s the Input Tool Model branched off a sequence of ACP extensions to programming languages: Pascal, C, C++, Java and Scala. The latter extension is named SubScript.

ACP

ACP is a solid mathematical aproach. Processes are notated as expressions, with operations such as addition, meaning choice, and multiplication, meaning sequence. At the bottom level there is a 0 (“deadlock”, a blocking process), a 1 (“empty process”, which does not really do anything but which immediately terminates succesfully) and so-called atomic actions. Axioms may be as rewrite rules so that we can reason precisely about semantics. E.g., (x+1)y = xy + 1y = xy + y In other words: adding 1 to a process makes it optional. Other kinds of composition, such as parallelism and disruption, may be defined in ACP using additional axioms.

SubScript

SubScript essentially extends Scala with concepts from ACP. It lives up to this quote of YACC creator Stephen Johnson:

" The ideas and techniques underlying YACC are fundamental and have application in many areas of computer science and engineering. One application I think is promising is using compiler-design techniques to design GUIs - I think GUI designers are still writing GUIs in the equivalent of assembly language, and interfaces have become too complicated for that to work any more."

SubScript adds a new element to Scala syntax, called script. A script resembles an ordinary method, but is defined by an ACP expression rather than ordinary Scala statements. For example, the following GUI script expresses that a user invokes “ok” by either clicking an “ok” button, or by pressing the Enter key:

script okCommand = clicked(okButton) + key(Enter)

By analogy with Scala’s implicit method feature we can make the called scripts clicked and key implicit, so that we can write even more concisely:

script okCommand  = okButton + Enter

Instead of adding two processes, it seems we are now adding a button and a key code. Thus the expressions are not limited to processes, but they hold any kind of items for which implicit conversions exist.

ACP is good for specifying and verifying internal process behaviour and synchronous process communication, but less for named processes and queued asynchronous communication. The latter are are well supported by the Actor model. Apparently ACP and the actor model may complement one another well. Internal actor behavior is relatively cumbersome to specify in Akka, the actor library for Scala. Support for finite state machines, futures and async has been some improvement using futures and the async construct, but at times obfuscate the control flow. Using SubScript for internal actor behaviour, message receptions and subsequent responses are not any more done in receive methods, but rather in nestable message handlers, enclosed in << and >> brackets. There are much like the partial functions in receive methods. The case keyword may be omitted if there is only 1 variation. Moreover a long arrow ==> may follow in a message handler, which is a special sequential composition: since it is still in the <<…>> bracket pair, earlier declared parameters and local values and variables are available. Moreover the message sender is saved under the hood in a local value, so that it may safely be used later. For example, consider

script live = << s:String => var t=0; w1! s; w2! s 
                         ==> <<i:Int=>t+=i>> & <<s:String=>t+=s.length>>
                             {sender!t}
              >>

This accepts a string as incoming message, and forwards it to 2 worker actors. These perform a computation; one returns a number and the other a string; the order of arrival is irrelevant thanks to the parallel composition operator &. The sum of the received number and the length of the string is then returned to the sender of the original message: this is a scala code fragment, which must be enclosed within braces; it corresponds with an atomic action in the ACP sense. Then the live script ends so that the actor terminates.

Current Implementation

SubScript is currently mature enough for GUI controllers and text parsing. SubScript actors are mature for experimentation, but probably not for production use. Currently its bare message handling has a throughput of several tens of thousands messages per second, which is in the order of 10 to 100 times slower than plain Akka.

Challenges

For more see www.subscript-lang.org