As users start to extend routines on the Python side - and we add more wrappers on bindings - we should start profiling these sections to catch unexpected bottlenecks.
For that, we should wrap and expose the BL_PROFILE logic.
BL_PROFILE are lightweight macros that wrap C++ object that start/stop profiling. We might want to wrap the underling C++ objects directly first. Then, we might add some convenience helper functions/objects as well, if needed.
This could be implemented similar to cupy's profiler with an __enter__/__exit__ that can be used within a with: context manager (and thus does not rely on destructor calls for the time measurement).
As users start to extend routines on the Python side - and we add more wrappers on bindings - we should start profiling these sections to catch unexpected bottlenecks. For that, we should wrap and expose the BL_PROFILE logic.
BL_PROFILE
are lightweight macros that wrap C++ object that start/stop profiling. We might want to wrap the underling C++ objects directly first. Then, we might add some convenience helper functions/objects as well, if needed.