biosal is a distributed BIOlogical Sequence Actor Library.

biosal applications are written in the form of actors which send each other messages, change their state, spawn other actors and eventually die of old age. These actors run on top of biosal runtime system called "The Thorium Engine" using the API. Thorium is a distributed actor machine. Thorium uses script-wise symmetric actor placement and is targeted for high-performance computing. Thorium is a general purpose actor model implementation.

Exciting actor applications for genomics

Name	Description
argonnite	k-mer counter
gc	Guanine-cytosine counter
spate	Exact, convenient, and scalable metagenome assembly and genome isolation for everyone (with emphasis on low-abundance species too)

Technologies

Key	Value
Name	biosal
Computation model	actor model (Hewitt & Baker 1977)
Programming language	C99 (ISO/IEC 9899:1999)
Message passing	MPI 2.2
Threads	Pthreads
License	The BSD 2-Clause License

Try it out

git clone https://github.com/GeneAssembly/biosal.git
cd biosal
make tests # run tests
make examples # run examples

see compilation options
see examples and tests

Branches

Branch name	Person (alphabetical order)	Clone URL
master	Sébastien Boisvert	HTTPS
energy	Sébastien Boisvert	HTTPS
pami	Huy Bui	HTTPS
granularity	George K. Thiruvathukal	HTTPS
entropy	Fangfang Xia	HTTPS

There is also a read-only mirror.
The Integration Manager Workflow is used.
Nightly build of master branch:

Community

Issues: https://github.com/GeneAssembly/biosal/issues?state=open
Website: https://github.com/GeneAssembly/biosal
Mailing list: biosal@lists.cels.anl.gov https://lists.cels.anl.gov/mailman/listinfo/biosal
Jenkins Continuous Integration: http://jenkins.biosal.anl.boisvert.info:8080/ (Hosted in Magellan OpenStack cloud at Argonne)
- Alternate address: http://jenkins-biosal.mcs.anl.gov:8080/ (does not seems to work everywhere)

Actor model

The actor model has actors and messages, mostly.

When an actor receives a message, it can (Agha 1986, p. 12, 2.1.3):

send a finite number of messages to other actors (thorium_actor_send);
create a finite number of new actors (thorium_actor_spawn, ACTION_SPAWN);
designate the behavior to be used for the next message it receives (thorium_actor_concrete_actor, ACTION_STOP).

Other names for the actor model: actors, virtual processors, activation frames, streams (Hewitt, Bishop, Steiger 1973).

Also, in the actor model, the arrival order of messages is both arbitrary and unknown (Agha 1986, p. 22, 2.4).

One of the most important requirements of actors is that of acquaintances. An actor can only send message to one of its acquaintances. Acquaintance vectors were introduced in (Hewitt and Baker 1977, p. 7, section III.3). Any actor using an acquaintance vector is migratable by an actor machine. An actor machine can distribute and balance actors according to some arbitrary rules wwhen all the actors in an actor system use acquaintance vectors.

Important actor model papers

Actor model links

Languages using actors

Design

Key concepts

Concept	Description	Structure
Message	Information with a source and a destination	struct thorium_message
Actor	Something that receives messages and behaves according to a script	struct thorium_actor
Actor mailbox	Messages for an actor are buffered there	struct core_fast_ring
Script	Describes the behavior of an actor (Hewitt, Bishop, Steiger 1973)	struct thorium_script
Node	A runtime system that can be connected to other nodes (see Erlang's definition)	struct thorium_node
Worker pool	A set of available workers inside a node	struct thorium_worker_pool
Worker	A instance that has a actor scheduling queue	struct thorium_worker
Scheduler	Each worker has a actor scheduling queue and an outbound message queue	struct thorium_scheduler
Transport	Each node has a transport subsystem for moving messages between nodes	struct thorium_transport

Implementation of the runtime system

The code has to be formulated in term of actors. Actors are executes inside a controlled environment managed by the Thorium engine, which is a runtime system. An actor has a name, and does something when it receives a message. A message is however first received by a node. The node gives it to a worker_pool. The worker pool assigns the actor to a worker. Finally, worker eventually calls the corresponding receive function using the actor and message presented.

When an actor sends a message, the destination is either an actor on the same node or an actor on another node. The runtime sends messages for actors on other nodes with MPI. Otherwise, the message is prepared and given to a actor directly on the same node.

Runtime

Each biosal node is managed by an instance of the Thorium engine. These Thorium instances speak to each other. The number of nodes is set by mpiexec -n @number_of_thorium_nodes ./a.out. The number of threads on each node is set with -threads-per-node.

The following command starts 256 nodes (there is 1 MPI rank per node) and 64 threads per node for a total of 256 * 64 = 16384 threads.

mpiexec -n 256 ./a.out -threads-per-node 64

Because the whole thing is event-driven by inbound and outbound messages, a single node can run much more actors than the number of threads it has.

The runtime also supports asymmetric numbers of threads, but most platforms have identical compute nodes (examples: IBM Blue Gene/Q, Cray XC30) so this is not very useful. Example:

# launch 32 nodes with 32 threads, 244 threads, 32 threads, 244 threads, and so on
mpiexec -n 32 ./a.out -threads-per-node 32,244

Design links

Linux workqueue

Other possible names

name := biosal
bio := Biological or Biology
s := Sequence or Scalable or Salable or Salubrious or Satisfactory
a := Analysis or Actor or Actors
l := Library or Lift

Alternative name: BIOlogy Scalable Actor Library

Tested platforms

Cray XE6
IBM Blue Gene/Q
XPS 13 with Fedora
Xeon E7-4830 with CentOS 6.5
Apple Mavericks

Product team

see CREDITS.md

GeneAssembly / biosal

readme