Following a discussion with @leogr, @FedeDP, and @ldegio, we recognize that the sdk could be furtherly optimized to minimize the number of memory allocations and copies involved in the _next/nextbatch and extract flows. More specifically:
In WrapExtractFuncs() and RegisterAsyncExtractors() we use both C.GoBytes() and C.GoString() to access C-allocated memory from Go. However, this causes a malloc and a copy under the hood, and can happens multiple times in a loop during batch extractions.
During next/next_batch and in the Events wrapper we allocate multiple buffers at each iteration. First, we allocate a PluginEvent struct for each event. Second, we allocate a slice ofPluginEvent to host the array of fetched events. Last, we allocate a C memory buffer to be used inside C.fill_event. This last buffer is then freed by scap, outside the control of the plugin.
As such, we see the following problems:
Memory is mostly not reused, and performance of high-throughput data sources can deteriorate due to the high number of malloc/copy at each cycle.
We often represent the event as a byte slice (in extractor functions, for instance), which allows sdk users to do uncontrolled write operations in the data buffer, thus potentially breaking the sdk code/logic.
The memory allocation governance is ambiguous. During the next lifecycle, the event buffers are allocated by the plugin and then freed in scap later in the process.
Proposal
Introduce an opaque interface that allows accessing C-allocated memory in a Go-friendly way, without making the sdk users notice. Most of the code relying on malloc/copy simply need to access C-allocated memory from Go, which is generally not safe and not supported in CGO.
Add protection measures to the data buffers passed to user-defined callbacks (such as Next), so that we can enforce read-only or write-only modes. This would make accessing C-allocated buffers safer, as we can have higher control on how it is accessed in the Go plugin.
Segregate memory allocation/free responsibilities only to one of the plugin and scap sides. Of the two, we opt for managing this on the plugin side because it can be aware of the event size defined by users.
Use the results of all the points above to simplify the APIs of the _next/nextbatch flow, and hide the complexity related to the optimized memory management.
Alternatives
Leave things as they are.
Additional Context
Points 1 and 2 of our proposal can be attained by an abstraction like this one:
// BytesReadWriter is an opaque wrapper for fixed-size memory buffers, that can safely be
// used in the plugin framework in a Go-friendly way. The purpose is to provide means
// for safe memory access through the read/write interface primitives, regardless of how
// how the buffer is physically allocated under the hood. For instance, this can be used
// to wrap a C-allocated buffers, to hide both the type conversion magic and avoid illegal
// memory operations. The io.ReadWriteSeeker interface is leveraged to implement the safe
// random memory access semantic. Note, read-only or write-only modes to the memory buffer
// can easily be accomplished by casting this to either a io.Reader or io.Writer.
type BytesReadWriter interface {
io.ReadWriteSeeker
//
// Returns an unsafe.Pointer that points to the underlying memory buffer.
Buffer() unsafe.Pointer
//
// Size returns the physical size of the underlying memory buffer.
Size() int64
//
// Offset returns the current cursor position relatively to the underlying buffer.
// The cursor position represents the index of the next byte in the buffer that will
// be available for read\write operations. This value is altered through the usage of
// Seek, Read, and Write. By definition, we have that 0 <= Offset() <= Size().
Offset() int64
}
func NewBytesReadWriter(buf unsafe.Pointer, size int64) (BytesReadWriter, error) {
// Inspired by: https://stackoverflow.com/a/66218124
var bytes []byte
(*reflect.SliceHeader)(unsafe.Pointer(&bytes)).Data = uintptr(unsafe.Pointer(buf))
(*reflect.SliceHeader)(unsafe.Pointer(&bytes)).Len = int(size)
(*reflect.SliceHeader)(unsafe.Pointer(&bytes)).Cap = int(size)
return &bytesReadWriter{
buffer: buf,
bytesAlias: bytes,
offset: 0,
size: size,
}, nil
}
type bytesReadWriter struct {
offset int64
size int64
buffer unsafe.Pointer
bytesAlias []byte
}
... // (Intuitive from here) implement Read, Write, Seek, Size, Offset, Buffer
Motivation
Following a discussion with @leogr, @FedeDP, and @ldegio, we recognize that the sdk could be furtherly optimized to minimize the number of memory allocations and copies involved in the _next/nextbatch and extract flows. More specifically:
WrapExtractFuncs()
andRegisterAsyncExtractors()
we use bothC.GoBytes()
andC.GoString()
to access C-allocated memory from Go. However, this causes a malloc and a copy under the hood, and can happens multiple times in a loop during batch extractions.next
/next_batch
and in theEvents
wrapper we allocate multiple buffers at each iteration. First, we allocate aPluginEvent
struct for each event. Second, we allocate a slice ofPluginEvent
to host the array of fetched events. Last, we allocate a C memory buffer to be used insideC.fill_event
. This last buffer is then freed byscap
, outside the control of the plugin.As such, we see the following problems:
byte
slice (in extractor functions, for instance), which allows sdk users to do uncontrolled write operations in the data buffer, thus potentially breaking the sdk code/logic.Proposal
Next
), so that we can enforce read-only or write-only modes. This would make accessing C-allocated buffers safer, as we can have higher control on how it is accessed in the Go plugin.Alternatives
Leave things as they are.
Additional Context Points 1 and 2 of our proposal can be attained by an abstraction like this one: