Branch | Status |
---|---|
master |
A header-only C++/CUDA library for GPU graph operations. C++ API Documentation.
Most of pangolin only works with nvcc
.
sudo apt install libnuma-dev
Include the pangolin headers in your code, and somewhere before using pangolin, call pangolin::init()
#include "pangolin.hpp"
#include "pangolin.cuh"
int main(int argc, char **argv) {
pangolin::init();
// your code here.
}
mkdir -p build && cd build
make
make test
include/pangolin/algorithm/broadcast.cuh
include/pangolin/algorithm/elementwise.cuh
include/pangolin/algorithm/fill.cuh
include/pangolin/algorithm/search.cuh
include/pangolin/algorithm/search.cuh
__device__
function: serial_sorted_search_binary
__device__
function: serial_sorted_search_linear
Get started at [include/pangolin/algorithm].
A reference-counted cudaStream_t
.
Automatically create, share, and destroy a single cudaStream_t, analogous to a std::shared_ptr
.
Get started at [include/pangolin/cuda_cxx/rc_stream.hpp].
C++ stdlib allocators for CUDA device memory, CUDA host memory, and CUDA managed memory. Get started at [include/pangolin/allocator].
Vector
s and Buffer
s backed by C++ allocator
Get started at [include/pangolin/dense].
CSR and CSR+COO sparse matrices backed by pangolin::Vector
Get started at [include/pangolin/sparse].
double_buffer.hpp
a fixed-size single-producer/single-consumer double bufferbounded_buffer.hpp
a fixed-size blocking multi-producer multi-consumer ring bufferBuilt on top of numa
and nvidia-ml
, query the system topology to discover which GPUs, CPUs, and NUMA regions are associated.
Get started at [include/pangolin/topology].
pangolin::logger::set_level(pangolin::logger::Level::ERR)
Allowed values are TRACE
, DEBUG
, INFO
, WARN
, ERR
, CRITICAL
.
API documentation is available at pangolin-docs.netlify.com.
Dockerfile | cpu | CUDA | c++ | CMake | Builds |
---|---|---|---|---|---|
test_cuda92-ubuntu1804.Dockerfile | amd64 | 9.2 | g++ 7.3.0 | 3.11.0 | ☑ |
test_cuda100-ubuntu1804.Dockerfile | amd64 | 10.0 | g++ 7.3.0 | 3.11.0 | ☑ |
- | amd64 (Ubuntu 16.04) | 10.0.130 | g++ 5.4.0 | 3.14.3 | ☑ |
POWER9 | 9.2.148 | clang 5.0.0 | 3.12.0 | ☑ | |
POWER9 | 9.2.148 | g++ ??? | 3.12.0 | ??? | |
test_cuda80-ubuntu1404.Dockerfile | amd64 | 8.0.61 | g++ 4.8.4 | 3.11.0 | ☑ |
test_cuda80-ubuntu1404-clang38.Dockerfile | amd64 | 8.0.61 | clang 3.8.0 | 3.11.0 | (needs check) x: problem parsing Vector |
test_cuda80-ubuntu1604.Dockerfile | amd64 | 8.0.61 | g++ 5.4.0 | 3.11.0 | (needs check) x: problem parsing Vector |
test_cuda92_ubuntu1604-clang5.Dockerfile | amd64 | 9.2.148 | clang 5.0.0 | 3.11.0 | x: problem with simd intrinsics |
- | amd64 | 9.2.148 | g++5.4.1 | 3.13.3 | x: problem with std::to_string in catch2 |
Instructions for installing CUDA on supported systems may be obtained from Nvidia's website.
On x86 linux, CMake provides prebuilt binaries with a shell script.
On POWER, you will need to build CMake from source.
You can check your cmake version with cmake --version
.
CMake will need to built with support for SSL.
Install doxygen and graphviz
sudo apt install doxygen graphviz
If doxygen is installed, building pangolin will also create API documentation.
Pangolin is a header-only library, but you can still build the tests
mkdir -p build
cd build
cmake ..
make
make tests
See Pangolin_Example for an example.
Pangolin may be used with CMake add_subdirectory()
, or installed and used with CMake find_package(pangolin CONFIG)
.
Pangolin exports the CMake pangolin::pangolin
target.
git submodule add https://github.com/c3sr/pangolin.git thirdparty/pangolin
cd thirdparty/pangolin
git checkout <hash, branch, etc>
add_subdirectory(...)
in your CMakeLists# CMakeLists.txt
add_subdirectory(thirdparty/pangolin)
...
target_link_libraries(... pangolin::pangolin32)
Pangolin is a header-only library, so "installation" is a matter of copying pangolin's headers to a desired location. Pangolin also includes a CMake config file for easy integration with other CMake projects.
git clone https://github.com/c3sr/pangolin.git
mkdir pangolin/build && cd pangolin/build
cmake .. -DCMAKE_INSTALL_PREFIX=<something>
make install
-DCMAKE_PREFIX_PATH
CMake option and find_package(pangolin CONFIG REQUIRED)
in your CMakeLists# CMakeLists.txt
find_package(pangolin CONFIG REQUIRED)
...
target_link_libraries(... pangolin::pangolin)
Tests can be built and run with
make
make test
Most tests require a GPU (those tests have the gpu
label).
Some tests require MPI (those tests have the mpi
label)
ctest -LE "gpu" # run tests that do not require a GPU
ctest -L "mpi" # run tests that require MPI
To run individual tests, you can do something like
make
test/test_csr
We automatically build and test the following configurations.
CI Platform | CUDA | NUMA | MPI | Build | Test |
---|---|---|---|---|---|
Azure Pipelines | 10.1 | Yes | Yes | Yes | Yes |
Azure Pipelines | 10.1 | No | Yes | Yes | Yes |
Azure Pipelines | 10.1 | Yes | No | Yes | non-mpi |
Azure Pipelines | 10.1 | No | No | Yes | non-mpi |
Non-interactive profiling is done by creating two different profiling files: a timeline file, and a metrics file. These are created with two separate invocations of nvprof:
nvprof -o timeline.nvvp -f ./<exe> ...
nvprof -o metrics.nvvp -f --analysis-metrics ./<exe>
These files can be opened in nvvp.
File >> import >> nvprof >> single process
timeline data file: timeline.nvvp Event/Metric data files: metrics.nvvp
On GPUs with CC > 7.2, some version of Nsight needs to be used. Either open the NVIDIA Nsight Compute profiler and do it interactively, or generate a report and import it
/usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli -o profile -f ...
or
/usr/local/cuda/NsightCompute-1.0/nv-nsight-cu&
File
> Open File
> profile.nsight-cuprof-report
The Nvidia Nsight Eclipse Edition can generate a timeline
On Ubuntu 18.04, there may be a conflict with the installed Java runtime (usually openjdk-11).
sudo apt install openjdk-8-jre
Then add the path to the java8 runtime to the top of /usr/local/cuda/libnsight/nsight.ini
like so
-vm
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin
Generate a timeline with nvprof
nvprof -o timeline.nvvp -f ./mybin
v
NvprofUse fixed width segments for unified memory timeline
Measuring Divergence: "predicated-on thread instructions executed" should be 32x "instructions executed" for no divergence.
"predicated-on thread instructions executed": instructions executed: number of times an instruction was executed by a warp Number of times the source (instruction) was executed by any active, predicated-on thread. For instructions that are executed unconditionally (i.e. without predicate), this is the number of active threads in the warp, multiplied with the respective Instructions Executed value.
Generate a profile using something like /usr/local/cuda/NsightCompute-1.0/nv-nsight-cu-cli -o ./my-binary
<the profile file you generated>
See references for some notes on references.
@INPROCEEDINGS{8916285,
author={M. {Almasri} and O. {Anjum} and C. {Pearson} and Z. {Qureshi} and V. S. {Mailthody} and R. {Nagi} and J. {Xiong} and W. {Hwu}},
booktitle={2019 IEEE High Performance Extreme Computing Conference (HPEC)},
title={Update on k-truss Decomposition on GPU},
year={2019},
volume={},
number={},
pages={1-7},
}
@INPROCEEDINGS{8916547,
author={C. {Pearson} and M. {Almasri} and O. {Anjum} and V. S. {Mailthody} and Z. {Qureshi} and R. {Nagi} and J. {Xiong} and W. {Hwu}},
booktitle={2019 IEEE High Performance Extreme Computing Conference (HPEC)},
title={Update on Triangle Counting on GPU},
year={2019},
volume={},
number={},
pages={1-7},
}
@INPROCEEDINGS{8547517,
author={V. S. {Mailthody} and K. {Date} and Z. {Qureshi} and C. {Pearson} and R. {Nagi} and J. {Xiong} and W. {Hwu}}, booktitle={2018 IEEE High Performance extreme Computing Conference (HPEC)},
title={Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition},
year={2018},
volume={},
number={},
pages={1-7},