Open lilux618 opened 2 years ago
Hi, @lilux618. libgrape-lite is not ready for the graph500 benchmark and BFS has some specific algorithmic optimizations that libgrape-lite does not apply. This is because these optimizations are hard to generalize thus other algorithms are also hard to gain benefits from them. For example, BFS is idempotent, thus the race condition is benign, which means we can avoid atomic operations in top-down BFS and do early termination in bottom-up BFS. However, other applications can not update without atomic operations and we do not want to keep some "special code path" only for specific algorithms in a programmable framework. Further, libgrape-lite also does not apply direction-optimization or leverage bitmap to avoid redundant memory access in our example code for BFS, this may explain why the performance of libgrape-lite GPU is slower than other graph500 benchmarks.
libgrape-lite just recently supported GPU and we are still working on that these days. It would be great if you are willing to contribute code to help us improve the performance of BFS, you can modify the GPU parallel_engine and implement a fully-optimized version for BFS.
Thank you ! I am new in libgrape-lite, and I am learning how to use and modify it. For example, now the BFS code can be used with only one source which is specified at the command line terminal, how can I change it in source code level? Further more, how do I merge other code with this program?
Hi, @lilux618. libgrape-lite is not ready for the graph500 benchmark and BFS has some specific algorithmic optimizations that libgrape-lite does not apply. This is because these optimizations are hard to generalize thus other algorithms are also hard to gain benefits from them. For example, BFS is idempotent, thus the race condition is benign, which means we can avoid atomic operations in top-down BFS and do early termination in bottom-up BFS. However, other applications can not update without atomic operations and we do not want to keep some "special code path" only for specific algorithms in a programmable framework. Further, libgrape-lite also does not apply direction-optimization or leverage bitmap to avoid redundant memory access in our example code for BFS, this may explain why the performance of libgrape-lite GPU is slower than other graph500 benchmarks.
libgrape-lite just recently supported GPU and we are still working on that these days. It would be great if you are willing to contribute code to help us improve the performance of BFS, you can modify the GPU parallel_engine and implement a fully-optimized version for BFS.
how can I change it in source code level?
The BFS source is passed in via gflags. Specifically, the source is set at here.
Further more, how do I merge other code with this program?
To integrate other project with libgrape-lite. You can refer the CreateAndQuery functions.
FYI: libgrape-lite also supports multi-gpu. The computation and communication follow the PIE model.
Do you have questions or need support? Please describe. I run the libgrapelite on A100 with datasets graph500-26 using BFS, the results are as follow:
load graph: 1080.76 sec
load application: 0.341124 sec
run algorithm: 0.23278 sec
print output: 57.175 sec It's not faster than on the CPU ,what's wrong with it ? Is it OK ? According to the graph500-benchmark list 'https://graph500.org/?page_id=12' , the GPU performance with BFS -graph500-26 can be as high as 319.061 GTEPS , while in libgrape-lite , this results can be computed as 22616/0.23=4.66GTEPS, is it too small ?
Additional context Add any other context about the question here.