Motivation
Async extraction allows speeding up the extraction flow. The idea is to have a worker goroutine busy-waiting and synchronizing with C code through a shared atomic value. This is more efficient than the cost of a C -> Go call, but can be quite expensive in terms of cpu time.
Extraction is one of the hottest path in the whole framework, so it's an high priority to optimize it and speed it up as much as possible. The current implementation can still be improved in few ways, and it's worth investigating potential optimizations.
Feature
As for a preliminary research, our current implementation can be improved in the following:
The async_extractor_wait function, that implements the busy-wait of the worker goroutine, is implemented in C. This means that a Go -> C calls is required. Although this does not add a big overhead, we may consider implementing the atomic synchronization in Go directly using the atomic package. Please not that the memory ordering should be consistent between the C and the Go implementations to avoid data races.
The memory order used to implement the synchronization mechanism is currently memory_order_seq_cst, which is sub-optimal. We can attempt using weaker memory orders, as our workflow would work well with the acquire-release semantics.
If the framework requests field extraction in batch (this is supported but not used in libsisnp yet), the async_extractor_extract_field function is called in a loop for each field. This triggers the synchronization mechanism for each of those fields. It would be way more efficient to support batch extraction in the async worker to reduce the overhead.
Additional context
Optimizations 1 and 2 might be mutually exclusive. Go memory guarantees are not explicit yet, but they seem to rely on sequential consistency (see: https://groups.google.com/g/golang-dev/c/vVkH_9fl1D8/m/azJa10lkAwAJ). Further investigation is required on this. Also, it would be meaningful to benchmark both option 1 and 2 to better understand what the most meaningful optimization is.
Motivation Async extraction allows speeding up the extraction flow. The idea is to have a worker goroutine busy-waiting and synchronizing with C code through a shared atomic value. This is more efficient than the cost of a C -> Go call, but can be quite expensive in terms of cpu time.
Extraction is one of the hottest path in the whole framework, so it's an high priority to optimize it and speed it up as much as possible. The current implementation can still be improved in few ways, and it's worth investigating potential optimizations.
Feature As for a preliminary research, our current implementation can be improved in the following:
async_extractor_wait
function, that implements the busy-wait of the worker goroutine, is implemented in C. This means that a Go -> C calls is required. Although this does not add a big overhead, we may consider implementing the atomic synchronization in Go directly using theatomic
package. Please not that the memory ordering should be consistent between the C and the Go implementations to avoid data races.memory_order_seq_cst
, which is sub-optimal. We can attempt using weaker memory orders, as our workflow would work well with the acquire-release semantics.libsisnp
yet), theasync_extractor_extract_field
function is called in a loop for each field. This triggers the synchronization mechanism for each of those fields. It would be way more efficient to support batch extraction in the async worker to reduce the overhead.Additional context Optimizations 1 and 2 might be mutually exclusive. Go memory guarantees are not explicit yet, but they seem to rely on sequential consistency (see: https://groups.google.com/g/golang-dev/c/vVkH_9fl1D8/m/azJa10lkAwAJ). Further investigation is required on this. Also, it would be meaningful to benchmark both option 1 and 2 to better understand what the most meaningful optimization is.