Fractal is a high performance and high productivity system for supporting distributed graph pattern mining (GPM) applications. Our current version is tested on Spark 3.5.0. Fractal features include:
Fractal is open-source with the Apache 2.0 license.
Fractal currently takes as input undirected labeled graph stored in a directory:
graph/metadata
: single line containing the number of
vertices (n
) and the number edges (m
) in the graph separated by
a single space.graph/adjlists
: each line u = 0..n-1
holds the
adjacency list of vertex u
. Each item in this list is a pair
(v,e)
representing respectively, neighbor vertex v
and edge id e
.
Edge ids are also represented as indexes e = 0..m-1
graph/vlabels
: line i
holds the label of vertex
i
.graph/elabels
: line i
holds the label of edge
i
.Example: directory data/citeseer
illustrates a valid formatting.
Run the following command to build a local Docker image that runs an Almond Scala/Spark Kernel Notebook with support for Fractal:
docker buildx build --output type=docker --tag fractalnb -f notebook/Dockerfile https://github.com/dccspeed/fractal.git
Run the container:
docker run -it --rm -p 8888:8888 fractalnb:latest
The local URL for accessing the notebook kernel should appear in the output.
Notebook examples are provided in notebook/
We provide a Docker image for this project. Run the following command to build a local Docker image:
docker buildx build --output type=docker --tag fractal https://github.com/dccspeed/fractal.git
For a list and description regarding built-in applications:
docker run fractal:latest
Data folder with input graphs can be mounted via Docker volumes (-v
). Arguments to Fractal runner are passed via
Docker environment variables (-e
). For example, the following command submit a Pattern-oblivious motif counting
application as a Docker container:
docker run -v ./data/:/data -e app=motifs_po -e steps=3 -e inputgraph=/data/citeseer fractal:latest
export JAVA_HOME=<openjdk-8-installation-folder>
wget https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3-scala2.13.tgz
tar xf spark-3.5.0-bin-hadoop3-scala2.13.tgz
mv spark-3.5.0-bin-hadoop3-scala2.13 spark
cd spark
export SPARK_HOME=`pwd`
git clone https://github.com/dccspeed/fractal.git # or direct download
cd fractal
export FRACTAL_HOME=`pwd`
./gradlew jar # download dependencies and build the project
./gradlew test # run tests
For a list and description regarding built-in applications:
./bin/fractal.sh
You can also implement your own application using Fractal API. We provide the subproject
"fractal-apps" to make this process easier. All you need to do is to add your application class
into fractal-apps/src/
, re-compile the project with ./gradlew jar
, and run your
code with the bin/fractal-custom-app.sh
script:
./bin/fractal-custom-app.sh
Please, refer to
fractal-apps/src/main/scala/br/ufmg/cs/systems/fractal/apps/
for an example.
Next, we re-compile the project with ./gradlew jar
and run the
application over
the dataset data/citeseer
:
args=data/citeseer app_class=br.ufmg.cs.systems.fractal.apps.MyMotifsApp ./bin/fractal-custom-app.sh
The following open-source projects are used in Fractal: