UC-Davis-molecular-computing / ppsim

Python package for simulating population protocols
MIT License
10 stars 5 forks source link

feature request: simulate large-state-count protocols #4

Open dave-doty opened 3 years ago

dave-doty commented 3 years ago

The batching algorithm is only feasible when state counts are small enough that every producible state can be enumerated at the start of the protocol, which rules it out (in the naive implementation) for protocols with very large numbers of states (e.g., counts in the tens of thousands or more).

However, the data visualization abilities of the library are impressive and useful even when using the slower sequential algorithm that picks two agents at a time to interact. Furthermore, the Cython implementation means that protocols can be specified in Cython, which is easier than specifying in C/C++ or some other fast language, and yet simulated with performance that we hope is close to that of C/C++.

So it would be great to be able to specify such protocols in this library and simulate them (though of course one would not expect to be able to go much beyond population sizes of 107 or so in a reasonable amount of time).

EricESeverson commented 3 years ago

Unfortunately, integrating this will require some significant refactoring, so isn't as simple as merely adding a new Simulator class.

As I see it, there are two next steps to take here to get ready for this integration.

The first is refactoring the Simulation class. The issue is that a lot of the data structures here are reliant on the array of counts data format (Simulation.configs, Simulation.history, the Plotter Snapshot object). To be more general purpose, the representation at this level should probably just remain as Python dicts (mapping states to counts, the same representation used for the input parameter init_config). The logic that exists right now in Simulation which prepares these arrays of counts, in order to pass them to the SimulatorMultiBatch cython object, should probably get pushed down into the simulator.pyx cython code. The other hard part is that a lot of the plotting / history features are also tied to this array of counts of enumerated states. Once the representations in Simulation are more general purpose, it becomes possible to run a simulation without having enumerated all the states beforehand. Then we can try to implement the naive simulation algorithm down in the cython code as another Simulator object.

The other tricky issue to figure out is how to pass in a user-defined cython transition function, such that the Simulator code will efficiently run this function. The naive implementation of this would have the function be a python object, and pay a lot of python overhead whenever the cython code wanted to call the transition function.