PR #226 introduced a new framework of batch evaluators to implement an (often requested) finer level of parallelism in pagmo.
The core new feature is a class, called bfe (short for "batch fitness evaluator"), that other pagmo classes (e.g., population, algorithms, etc.) can use when the need arise to evaluate a large batch of individuals, possibly in a parallelised and/or vectorised fashion. For instance, the bfe class is now used by the population class to implement the parallel initialisation of the fitnesses of its individuals.
The bfe class, similarly to other pagmo classes such as problem, algorithm, etc. is a type-erased container that stores internally what we call a UDBFE (user-defined batch fitness evaluator), which actually implements the evaluation of the fitnesses of a batch of input decision vectors. The thread_bfe UDBFE, for instance, parallelises the evaluation of a batch of decision vectors using the Intel TBB library.
In addition to the bfe class, pagmo problems can now implement an additional, optional method, called batch_fitness(), which implements the batch fitness evaluation functionality on a problem-by-problem basis. That is, whereas a thread_bfe is capable of accelerating the evaluation of a generic pagmo problem via the multi-threaded invocation of the problem's fitness() method, the batch_fitness() method is usable only by those problem actually implementing it. The batch_fitness() method requires, essentially, the implementation of the fitness function of a problem in two places, fitness() and batch_fitness(); the payoff is that the batch_fitness() method allows, for instance, to implement the parallelised evaluation of a group of individuals using specialised hardware(e.g., GPUs, SIMD instructions, etc.), which would not be otherwise possible with the bfe class (which only "sees" the fitness() method of the problem as a black box).
226 contains a complete, tested and API-documented batch fitness evaluation API for C++. The python exposition, however is incomplete, and user-docs and tutorials need also to be written, given the non-trivial nature of this new feature. This PR tracks the progress of the missing pieces.
Python
[ ] expose the bfe machinery to the associated packages API (so that it is possible to implement and expose new bfes in APs)
[x] test the exposed bfe class (using also the _test_bfe exposed from C++)
[x] test the exposed UDBFEs
[x] test the new batch_fitness()/has_batch_fitness() methods for UDPs
[x] test the new pop/island/archi constructors which take advantage of the bfe
[x] test the decorability of the batch_fitness() method in the decorator meta-problem
[x] implement python-based UDBFEs (multiprocessing, ipyparallel), override the default UDBFE choice when working in Python
C++
[x] start taking advantage of the bfe in algos (gaco + bfe implemented in #290, NSGA2 done in #308, others on their way)
[ ] implement proper batch_fitness() support for the decompose and unconstrain metaproblems. translate already supports batch_fitness() (done in #226). Not conceptually difficult, but it may require some refactoring of the multiobjective utilities to accommodate the batch_fitness() data layout. @darioizzo would you like to take care of this eventually?
[ ] for symmetry with the island class, it would probably make sense to have a fork_bfe down the line, but not super high priority
Docs
[ ] write C++ and Python tutorials on how to use the bfe and implement new bfes
[ ] perhaps show a few cases in which the bfe improves performance for heavy problems (e.g., pop init, algorithm which takes advantage of bfe - gaco perhaps?)
[ ] show some examples of implementations of batch_fitness() taking advantage of SIMD/GPUs (perhaps rely on pyopencl/numba on the Python side?)
[x] in user docs heading, we should use something more descriptive than "bfe", "batch evaluators" is probably a good compromise in terms of readability
PR #226 introduced a new framework of batch evaluators to implement an (often requested) finer level of parallelism in pagmo.
The core new feature is a class, called
bfe
(short for "batch fitness evaluator"), that other pagmo classes (e.g., population, algorithms, etc.) can use when the need arise to evaluate a large batch of individuals, possibly in a parallelised and/or vectorised fashion. For instance, thebfe
class is now used by thepopulation
class to implement the parallel initialisation of the fitnesses of its individuals.The
bfe
class, similarly to other pagmo classes such asproblem
,algorithm
, etc. is a type-erased container that stores internally what we call a UDBFE (user-defined batch fitness evaluator), which actually implements the evaluation of the fitnesses of a batch of input decision vectors. Thethread_bfe
UDBFE, for instance, parallelises the evaluation of a batch of decision vectors using the Intel TBB library.In addition to the
bfe
class, pagmo problems can now implement an additional, optional method, calledbatch_fitness()
, which implements the batch fitness evaluation functionality on a problem-by-problem basis. That is, whereas athread_bfe
is capable of accelerating the evaluation of a generic pagmo problem via the multi-threaded invocation of the problem'sfitness()
method, thebatch_fitness()
method is usable only by those problem actually implementing it. Thebatch_fitness()
method requires, essentially, the implementation of the fitness function of a problem in two places,fitness()
andbatch_fitness()
; the payoff is that thebatch_fitness()
method allows, for instance, to implement the parallelised evaluation of a group of individuals using specialised hardware(e.g., GPUs, SIMD instructions, etc.), which would not be otherwise possible with thebfe
class (which only "sees" thefitness()
method of the problem as a black box).226 contains a complete, tested and API-documented batch fitness evaluation API for C++. The python exposition, however is incomplete, and user-docs and tutorials need also to be written, given the non-trivial nature of this new feature. This PR tracks the progress of the missing pieces.
Python
bfe
class (using also the_test_bfe
exposed from C++)batch_fitness()
/has_batch_fitness()
methods for UDPsbatch_fitness()
method in the decorator meta-problemC++
batch_fitness()
support for thedecompose
andunconstrain
metaproblems.translate
already supportsbatch_fitness()
(done in #226). Not conceptually difficult, but it may require some refactoring of the multiobjective utilities to accommodate thebatch_fitness()
data layout. @darioizzo would you like to take care of this eventually?fork_bfe
down the line, but not super high priorityDocs
batch_fitness()
taking advantage of SIMD/GPUs (perhaps rely on pyopencl/numba on the Python side?)