Spark Cyclone is an Apache Spark plug-in that accelerates the performance of Spark by using the SX-Aurora TSUBASA "Vector Engine" (VE). The plugin enables Spark users to accelerate their existing jobs by generating optimized C++ code and executing it on the VE, with minimal or no effort.
Spark Cyclone currently offers three pathways to accelerate Spark on the VE:
map()
on the VE.Integrating the Spark Cyclone plugin into an existing Spark job is very straightforward. The following is the minimum set of flags that need to be added to an existing Spark job configuration:
$ $SPARK_HOME/bin/spark-submit \
--name YourSparkJobName \
--master yarn \
--deploy-mode cluster \
--num-executors=8 --executor-cores=1 --executor-memory=8G \ # Specify 1 executor per VE core
--jars /path/to/spark-cyclone-sql-plugin.jar \ # Add the Spark Cyclone plugin JAR
--conf spark.executor.extraClassPath=/path/to/spark-cyclone-sql-plugin.jar \ # Add Spark Cyclone libraries to the classpath
--conf spark.plugins=io.sparkcyclone.plugin.AuroraSqlPlugin \ # Specify the plugin's main class
--conf spark.executor.resource.ve.amount=1 \ # Specify the number of VEs to use
--conf spark.resources.discoveryPlugin=io.sparkcyclone.plugin.DiscoverVectorEnginesPlugin \ # Specify the class used to discover VE resources
--conf spark.cyclone.kernel.directory=/path/to/kernel/directory \ # Specify a directory where the plugin builds and caches C++ kernels
YourSparkJob.py
Please refer to the Plugin Configuration guide for an overview of the configuration options available to Spark Cyclone.
While parts of the codebase can be developed on a standard x86
machine running
Linux or MacOS, building and testing the plugin requires a system that has VEs
properly installed and set up - please refer to the
VE Documentation for more information on this.
The following guides contain all the necessary setup and installation steps:
In particular, the system should have the following software ready after setup:
The following pages cover all aspects of Spark Cyclone development:
Spark Cyclone is licensed under the Apache License, Version 2.0.
For additional information, please see the LICENSE and NOTICE files.