brenthuisman commented 3 years ago

AEP: Have Arbor Recipe File Format

Goal

Have a file format to store and load Arbor Recipes to/from a file. Like the ACC format.

Rationale

With the Arbor Cable Cell format, a major part of a major promise of Arbor is realized: letting users store the neuroscientific description of their studies independently of tool/program/API/workflow specific details. The Python API and now the Gui provide two intuitive ways for users design their cells, and with ACC they can be stored in a portable format and re-used for as long as Arbor supports it, which is indefinitely.

The Arbor recipe is the other major component that consitutes the neuroscientific description, hence this proposal.

From a user point of view having an Arbor Recipe Format makes sense, because

I can store it, and it's a better way to store a recipe than a Python script.
I can re-use it (like ACC), so a potential Arbor-GUI for network design would target the other half of the scientific description
Now we're almost there for transporting a simulation from my laptop to HPC: I simply feed Arbor on HPC the Arbor Recipe file, one or more .acc files, and a brief description of the hardware context/load balancing.
Co-sim applications will probably be able to take advantage of this as well
This also makes recipes trivially pickleable, which for certain corners of the Python ecosystem is a boon.

Scope

Three questions:

A recipe is not a simulation, although quite close. Would it make sense to extend this format to cover domain decompositions, hardware descriptions? For a user, it would be extremely useful if they can design the simulation on their machine, copy it over to their HPC and with minimal/no extra info run over all the hardware that they assign the job.
A major extension of the Arbor Recipe could and I think should be rolled up in this process, to combat exploding explicit connection lists, which is a native format for connectivity matrices, see also ticket https://github.com/arbor-sim/arbor/issues/418. We'd need a higher level way to describe cell poplutions and connections between them. @thorsterhater donated the following comment to this issue:

The algebra approach would be a declarative (haha!) DSL something (and I am making something up here)
(connect
  (choose 100 (has-GJ type="A"))
  (choose 100 (has-GJ type="B")))
basically telling Arbor to sample 100 cells with GJ of types A and B each and wiring them. This could be materialised as a connection list like [(id, id)]

Composability of recipes. Let's take a concrete example to illustrate: a recipe could describe the inferior olive nucleus, together with a number of Arbor Cable Cells. Another recipe could describe (part of) the cerebellum, also with its cells. A third a piece of spinal cord. If these recipes exists on disk, it is a matter of time before someone comes up with the idea of joining them up. So, we'd need to have a way to join the recipes, and optionally (?) have a way for users to describe an extra connectivity matrix describing the connections between the (components of) the recipes.

Implementation

Discuss!
Tackle https://github.com/arbor-sim/arbor/issues/418
...
profit.

thorstenhater commented 3 years ago

General

First up: let's call this ARF. As said before: I like this, it removes -- as you say above -- the need for coding skills (almost) completely on the user's side while not removing the option to do so where needed. It also isolates us from having to support questionable installations.

Parts

This should go in

A specification of cell instances (= type + parameters), their counts, and some kind of label to address them
The description of the individual cell types
The mechanism catalogues used
Connection specification potentially using the labels above

About the labels: Currently we use simple integers, but we also have been moving away from plain numbers and for good reason. If we want to compose recipes, it's no longer clear what gid=42 means; it must be at least qualified by a recipe id rid=23/gid=42.

This should not

Simulation parameters
Probes (also not part of ACC)
Hardware descriptions

Reasoning here is that these are specific to a concrete simulation or experiment and neither the cells nor the network.

Composition

As said in the discussion you mention here: Composition is the central feature here. It allows users to design and share building blocks and connect them as needed into larger blocks. This requires some care to design a structure preserving composition, ie

Given A, B recipes then connect(A, B, [...]) is a recipe

but it's nothing hard. In fact we could write this today using C++ recipes only, I believe.

Sketch

Most callbacks would just return the natural extension and map into the sub-recipes' gid ranges
The connection-type callback will in addition return the connections between sub-recipes

Implementation Timeline / Building Blocks

After consideration: We can do without the network DSL for now, but be ready to plug it in. That means in ARF, we could have a term like

(connections-on (explicit (0 23) (42 1)))

where explicit would later become part of the network DSL. That's mainly because the network spec will likely be a more involved feature.

So, I think work could be started on this without further ado. All the building blocks are there, but some of them might be amended in the future.

brenthuisman commented 3 years ago

The benefit isn't even so much that users don't need to code; although ARF would be ~human readable xml~ a text format like ACC and could be created and edited by hand, I think that, like ACC, the way people would create these files is by building one in Python and use recipe.save("myrecipe.arf"), or, eventually, have a GUI to create the connectivity matrix and define probes and whatnot.

Probes? Yes! They are part of recipes, so I would say they belong here. Partially storing recipes seems surprising and not ideal to me. Do you foresee a technical reason for leaving them out?

thorstenhater commented 3 years ago

I was thinking about graphical tools here, not writing out s-exp by hand.

Probes are part of the recipe although samplers are not, so probes yay, sampler nay?! They are more of a feature of the experiment, so I am tending towards saying they should be add after composing a simulation ready recipe. The risk is that users pay for something they never use.

Helveg commented 3 years ago

About pickleability, if there's a recipe constructor that takes the ARF string you can register a dispatch function on arbor module load to transparently store recipe pickles as their ARF string to pass to __init__. I volunteer to write it, I love these things 🤤

import arbor, pickle

class MyRecipe(arbor.recipe):
  # ...

with open("ello", "wb") as f:
  pickle.dump(MyRecipe(), f)

arbor-sim / arbor