fastscape-lem / fastscapelib

A C++/Python library of efficient algorithms for landscape evolution modeling
http://fastscapelib.readthedocs.io
GNU General Public License v3.0
36 stars 6 forks source link

Add flow kernels API and parallel execution #157

Closed adriendelsalle closed 4 months ago

adriendelsalle commented 5 months ago

Add flow kernels API and parallel execution

Add capability to write flow kernel, both in C++ and Python, and to apply it in parallel.

For Python flow kernels, rely on numba to jit data and functions, and use it without any CPython call to be able to release the GIL and thus take benefit of multithreaded execution.

Flow kernel C++ API

Flow kernel Python API

Add Python NumbaFlowKernelFactory class in fastscapelib.flow.numba_kernel sudmodule to allow declaration of a fastscapelib flow kernel from a flow_graph and user defined data and parameters, relying on numba for Just-In-Time compilation of kernel and data to be used by C++ code:

Add a convenient create_flow_kernel free-function to internally call the NumbaFlowKernelFactory and return the kernel and associated data packed as a tuple.

Allow easier declaration of an eroder flow kernel from a NumbaEroderFlowKernel base class:

All of this is made available using few intermediate classes to wrap the numba's njitted functions or jitclasses generated by NumbaFlowKernelFactory, namely:

Parallel execution of flow kernels

The C++ flow_graph API was extended with apply_kernel<FK, FKD> template method taking 2 arguments, namely the flow kernel and its associated data. It was also added a private thread pool for any need of parallel execution.

apply_kernel dispatches the call to either apply_kernel_seq or apply_kernel_par depending on the kernel parameters n_threads and min_level_size:

Note: the templating of apply_kernel allows to pass any kernel and data using duck typing, they only need to be consistent (kernel function must apply on the FKD type)

The thread pool is a custom lock-free and busy wait thread pool designed to maximize the throughput and minimize the latency at each apply_kernel_par call:

benbovy commented 4 months ago

All green!

benbovy commented 4 months ago

Let's merge this! Thank you @adriendelsalle for your great work!