bluescarni commented 5 years ago

PR #226 introduced a new framework of batch evaluators to implement an (often requested) finer level of parallelism in pagmo.

The core new feature is a class, called bfe (short for "batch fitness evaluator"), that other pagmo classes (e.g., population, algorithms, etc.) can use when the need arise to evaluate a large batch of individuals, possibly in a parallelised and/or vectorised fashion. For instance, the bfe class is now used by the population class to implement the parallel initialisation of the fitnesses of its individuals.

The bfe class, similarly to other pagmo classes such as problem, algorithm, etc. is a type-erased container that stores internally what we call a UDBFE (user-defined batch fitness evaluator), which actually implements the evaluation of the fitnesses of a batch of input decision vectors. The thread_bfe UDBFE, for instance, parallelises the evaluation of a batch of decision vectors using the Intel TBB library.

In addition to the bfe class, pagmo problems can now implement an additional, optional method, called batch_fitness(), which implements the batch fitness evaluation functionality on a problem-by-problem basis. That is, whereas a thread_bfe is capable of accelerating the evaluation of a generic pagmo problem via the multi-threaded invocation of the problem's fitness() method, the batch_fitness() method is usable only by those problem actually implementing it. The batch_fitness() method requires, essentially, the implementation of the fitness function of a problem in two places, fitness() and batch_fitness(); the payoff is that the batch_fitness() method allows, for instance, to implement the parallelised evaluation of a group of individuals using specialised hardware(e.g., GPUs, SIMD instructions, etc.), which would not be otherwise possible with the bfe class (which only "sees" the fitness() method of the problem as a black box).

226 contains a complete, tested and API-documented batch fitness evaluation API for C++. The python exposition, however is incomplete, and user-docs and tutorials need also to be written, given the non-trivial nature of this new feature. This PR tracks the progress of the missing pieces.

Python

[ ] expose the bfe machinery to the associated packages API (so that it is possible to implement and expose new bfes in APs)
[x] test the exposed bfe class (using also the _test_bfe exposed from C++)
[x] test the exposed UDBFEs
[x] test the new batch_fitness()/has_batch_fitness() methods for UDPs
[x] test the new pop/island/archi constructors which take advantage of the bfe
[x] test the decorability of the batch_fitness() method in the decorator meta-problem
[x] implement python-based UDBFEs (multiprocessing, ipyparallel), override the default UDBFE choice when working in Python

C++

[x] start taking advantage of the bfe in algos (gaco + bfe implemented in #290, NSGA2 done in #308, others on their way)
[ ] implement proper batch_fitness() support for the decompose and unconstrain metaproblems. translate already supports batch_fitness() (done in #226). Not conceptually difficult, but it may require some refactoring of the multiobjective utilities to accommodate the batch_fitness() data layout. @darioizzo would you like to take care of this eventually?
[ ] for symmetry with the island class, it would probably make sense to have a fork_bfe down the line, but not super high priority

Docs

[ ] write C++ and Python tutorials on how to use the bfe and implement new bfes
[ ] perhaps show a few cases in which the bfe improves performance for heavy problems (e.g., pop init, algorithm which takes advantage of bfe - gaco perhaps?)
[ ] show some examples of implementations of batch_fitness() taking advantage of SIMD/GPUs (perhaps rely on pyopencl/numba on the Python side?)
[x] in user docs heading, we should use something more descriptive than "bfe", "batch evaluators" is probably a good compromise in terms of readability

bluescarni commented 4 years ago

This is now being worked on in #380.

bluescarni commented 4 years ago

Most of the todo items were completed in #380, the rest is more long term. Will close this for now.

esa / pagmo2

Batch evaluators improvements #283

226 contains a complete, tested and API-documented batch fitness evaluation API for C++. The python exposition, however is incomplete, and user-docs and tutorials need also to be written, given the non-trivial nature of this new feature. This PR tracks the progress of the missing pieces.

Python

C++

Docs