gBolt--a fast, memory efficient, and light-weight implementation for gSpan algorithm in data mining
gBolt is up to 100x faster (see detailed experiments) than Yan's original implementation with multi-threading on a single machine. gBolt also reduces more than 200 folds memory usage, running efficiently on personal computers.
gBolt is fast because it:
gBolt is memory efficient because it:
gBolt is light-weight because it:
gBolt is correct because:
extern/data/Compound_422
and extern/data/Chemical_340
with minimal support from 0.1 to 0.9. The results generated by gBolt are exactly the same as Yan's gSpan-64. Required:
sudo apt-get install cmake
or install from source.Optional:
Steps:
git clone --recursive https://github.com/Jokeren/gBolt.git
cd gBolt
mkdir build && cd build
cmake ..
make
Build Options:
-DGBOLT_SERIAL=ON
: serial execution without OpenMP-DGBOLT_PERFORMANCE=ON
: display simple performance information and use hash map-DJEMALLOC_DIR=/path/to/dir
: use jemalloc for memory management-DCMAKE_BUILD_TYPE=<Release> or <RelWithDebInfo>
: config build typeRun an example:
./build/gbolt -i extern/data/Chemical_340 -s 0.2
Arguments help:
./build/gbolt -h
Multi-threading config:
export OMP_NUM_THREADS=<hardware core num for recommendation>
Examples:
./extern/data
Format:
t # <graph-id>
v <vertex-id> <vertex-label>
...
e <vertex-id> <vertex-id> <edge-label>
...
<graph-id>
must be contiguous without gaps, which means gBolt only supports t # <id>
followed by t # <id + 1>
.
<vertex-id>
must be contiguous without gaps, which means gBolt only supports v # <id>
followed by v # <id + 1>
.
All the <id>
fields must be non-negative integers starting from 0.
All the <label>
fields must be non-negative integers.
Yan, Xifeng, and Jiawei Han. "gspan: Graph-based substructure pattern mining." Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on. IEEE, 2002.
@misc{zhou_2019, title={Jokeren/gBolt}, url={https://github.com/Jokeren/gBolt}, journal={GitHub}, author={Zhou, Keren}, year={2019}, month={March}}
gBolt is designed for efficiency, so we have not developed utilities for it. Your are welcome to implement Python
, C
, or graphical interfaces.