decouple Spark executor core count from VE core count
Support asynchronous malloc() / free()
Additional code and tests have been added on top of the consolidation to harden the introduced features.
Details:
Introduce support for multiple VEO asynchronous contexts inside the VeProcess abstraction. Async contexts are to be accessed with re-entrant locks
Force all async VeProcess methods to randomly pick the first available VEO async context to run on
Update VeProcess.createFromContext to allow only 1 executor per VE for now
Secure the context lock within a withVeoProc block, so that we don't even try to go deeper if the process has already been closed
Remove synchronization locks on heapRecords
Collect Sync-ish call metrics
Implement asynchronous malloc / free. The idea is to move all AVEO calls (except for veo_load_library) away from using the main VEO thread. This allows us to implement more fancy variants of malloc / free if needed,
including adding custom logging and metrics.
Update VeProcess.load() to always attempt to load libcyclone.so first before loading the requested library.
Fixes to existing VE-based unit tests to work with the VE-executor decoupling and asynchronous malloc / free features
Update WithVeProcess to load libcyclone.so before each test case
Update VeProcess creation in unit test environments to create no more than 2 VEO asynchronous contexts, since there is a large overhead in creating extra asynchronous contexts
Add support for freeing multiple pointers with one asynchronous VE method call
Remove heap allocation records before calling free() instead of after to avoid a race condition where subsequent allocation from the VE returns a new allocation with the same address, but different allocation size
Make VE cores used per executor configurable from VeProcess.createFromContext()
Don't warn about 0 pointer allocations being already known, as they are expected to happen multiple times.
Free empty aggregation results in batched fashion
Explicitly start the VE Process in SparkCycloneExecutorPlugin.init()
Close VeColVector when free()'d externally
Check for filepaths that are too long for AVEO to accept when loading a library
Add tests to ensure that a proper error is thrown if libcyclone.so is not yet loaded
This PR consolidates the following efforts that have been made over multiple branches (see https://github.com/XpressAI/SparkCyclone/pull/593, https://github.com/XpressAI/SparkCyclone/pull/595, https://github.com/XpressAI/SparkCyclone/compare/NS-56/2/remove-column-to-row-attempt-2?expand=1, and https://github.com/XpressAI/SparkCyclone/compare/NS-56-transfer?expand=1):
malloc()
/free()
Additional code and tests have been added on top of the consolidation to harden the introduced features.
Details:
VeProcess
abstraction. Async contexts are to be accessed with re-entrant locksVeProcess
methods to randomly pick the first available VEO async context to run onVeProcess.createFromContext
to allow only 1 executor per VE for nowwithVeoProc
block, so that we don't even try to go deeper if the process has already been closedmalloc
/free
. The idea is to move all AVEO calls (except forveo_load_library
) away from using the main VEO thread. This allows us to implement more fancy variants ofmalloc
/free
if needed, including adding custom logging and metrics.VeProcess.load()
to always attempt to loadlibcyclone.so
first before loading the requested library.malloc
/free
featuresWithVeProcess
to loadlibcyclone.so
before each test caseVeProcess
creation in unit test environments to create no more than 2 VEO asynchronous contexts, since there is a large overhead in creating extra asynchronous contextsfree()
instead of after to avoid a race condition where subsequent allocation from the VE returns a new allocation with the same address, but different allocation sizeVeProcess.createFromContext()
SparkCycloneExecutorPlugin.init()
VeColVector
whenfree()
'd externallylibcyclone.so
is not yet loadedfree()