Open JacobDomagala opened 2 months ago
PR tests (gcc-12, ubuntu, mpich)
Build for 16da9e70a3fd04baf66db2401d4f8f0bf4f050f6 (2024-04-16 12:06:29 UTC)
Compilation - successful
Testing - passed
PR tests (gcc-12, ubuntu, mpich, verbose)
Build for 5ce7cd9a21030c279741b3a6238e3f2e8224b814 (2024-06-18 16:35:02 UTC)
FAILED: tests/CMakeFiles/allreduce.dir/perf/allreduce.cc.o
/usr/bin/ccache /usr/lib/ccache/g++ -DJSON_USE_IMPLICIT_CONVERSIONS=1 -DVT_NO_COLOR_ENABLED -I/vt/lib/CLI -I/vt/lib/json/include -I/vt/lib/brotli/c/include -I/vt/lib/libfort/lib -I/build/vt/release -I/vt/src -isystem /vt/lib/fmt/include -isystem /vt/lib/EngFormat-Cpp/include -isystem /build/checkpoint/install/include -O3 -DNDEBUG -fdiagnostics-color=always -std=c++17 -MD -MT tests/CMakeFiles/allreduce.dir/perf/allreduce.cc.o -MF tests/CMakeFiles/allreduce.dir/perf/allreduce.cc.o.d -o tests/CMakeFiles/allreduce.dir/perf/allreduce.cc.o -c /vt/tests/perf/allreduce.cc
/vt/tests/perf/allreduce.cc:49:10: fatal error: Kokkos_Core.hpp: No such file or directory
49 | #include <Kokkos_Core.hpp>
| ^~~~~~~~~~~~~~~~~
compilation terminated.
PR tests (gcc-12, ubuntu, mpich, verbose, kokkos)
Build for 5441ffcdb534f48af026cdc6c4fedcf27a9d5e5a (2024-07-06 10:36:48 UTC)
Build failed for unknown reason. Check build logs
Results of running allreduce on std::vector
RUNNING TEST: test_reduce (Number of runs = 25) ...
Test results for test_reduce running on 16 nodes:
[7] Results for test_reduce (avg: 1.730ms stdev: 0.526ms min: 1.359ms max: 4.086ms)
[10] Results for test_reduce (avg: 1.733ms stdev: 0.532ms min: 1.359ms max: 4.120ms)
[6] Results for test_reduce (avg: 1.734ms stdev: 0.532ms min: 1.360ms max: 4.118ms)
[0] Results for test_reduce (avg: 1.759ms stdev: 0.527ms min: 1.395ms max: 4.117ms)
[13] Results for test_reduce (avg: 1.768ms stdev: 0.535ms min: 1.396ms max: 4.178ms)
[1] Results for test_reduce (avg: 1.767ms stdev: 0.536ms min: 1.392ms max: 4.175ms)
[14] Results for test_reduce (avg: 1.770ms stdev: 0.537ms min: 1.398ms max: 4.186ms)
[15] Results for test_reduce (avg: 1.762ms stdev: 0.524ms min: 1.397ms max: 4.144ms)
[2] Results for test_reduce (avg: 1.729ms stdev: 0.531ms min: 1.358ms max: 4.116ms)
[9] Results for test_reduce (avg: 1.768ms stdev: 0.536ms min: 1.394ms max: 4.178ms)
[8] Results for test_reduce (avg: 1.771ms stdev: 0.536ms min: 1.398ms max: 4.180ms)
[4] Results for test_reduce (avg: 1.735ms stdev: 0.529ms min: 1.358ms max: 4.097ms)
[3] Results for test_reduce (avg: 1.729ms stdev: 0.527ms min: 1.357ms max: 4.087ms)
[5] Results for test_reduce (avg: 1.756ms stdev: 0.538ms min: 1.380ms max: 4.173ms)
[11] Results for test_reduce (avg: 1.734ms stdev: 0.532ms min: 1.363ms max: 4.120ms)
[12] Results for test_reduce (avg: 1.733ms stdev: 0.531ms min: 1.360ms max: 4.110ms)
RUNNING TEST: test_allreduce_rabenseifner (Number of runs = 25) ...
Test results for test_allreduce_rabenseifner running on 16 nodes:
[7] Results for test_allreduce_rabenseifner (avg: 1.227ms stdev: 0.082ms min: 1.084ms max: 1.430ms)
[4] Results for test_allreduce_rabenseifner (avg: 1.226ms stdev: 0.078ms min: 1.084ms max: 1.398ms)
[6] Results for test_allreduce_rabenseifner (avg: 1.230ms stdev: 0.082ms min: 1.087ms max: 1.441ms)
[0] Results for test_allreduce_rabenseifner (avg: 1.248ms stdev: 0.081ms min: 1.104ms max: 1.442ms)
[13] Results for test_allreduce_rabenseifner (avg: 1.249ms stdev: 0.082ms min: 1.109ms max: 1.447ms)
[1] Results for test_allreduce_rabenseifner (avg: 1.249ms stdev: 0.084ms min: 1.104ms max: 1.471ms)
[14] Results for test_allreduce_rabenseifner (avg: 1.248ms stdev: 0.080ms min: 1.109ms max: 1.444ms)
[5] Results for test_allreduce_rabenseifner (avg: 1.244ms stdev: 0.081ms min: 1.098ms max: 1.439ms)
[9] Results for test_allreduce_rabenseifner (avg: 1.250ms stdev: 0.081ms min: 1.107ms max: 1.449ms)
[15] Results for test_allreduce_rabenseifner (avg: 1.250ms stdev: 0.081ms min: 1.108ms max: 1.451ms)
[2] Results for test_allreduce_rabenseifner (avg: 1.226ms stdev: 0.082ms min: 1.085ms max: 1.429ms)
[8] Results for test_allreduce_rabenseifner (avg: 1.251ms stdev: 0.081ms min: 1.108ms max: 1.447ms)
[11] Results for test_allreduce_rabenseifner (avg: 1.231ms stdev: 0.080ms min: 1.089ms max: 1.420ms)
[3] Results for test_allreduce_rabenseifner (avg: 1.226ms stdev: 0.081ms min: 1.083ms max: 1.429ms)
[12] Results for test_allreduce_rabenseifner (avg: 1.227ms stdev: 0.082ms min: 1.087ms max: 1.413ms)
[10] Results for test_allreduce_rabenseifner (avg: 1.228ms stdev: 0.079ms min: 1.087ms max: 1.410ms)
RUNNING TEST: test_allreduce_recursive_doubling (Number of runs = 25) ...
Test results for test_allreduce_recursive_doubling running on 16 nodes:
[12] Results for test_allreduce_recursive_doubling (avg: 1.888ms stdev: 0.163ms min: 1.699ms max: 2.383ms)
[4] Results for test_allreduce_recursive_doubling (avg: 1.884ms stdev: 0.163ms min: 1.697ms max: 2.375ms)
[6] Results for test_allreduce_recursive_doubling (avg: 1.886ms stdev: 0.160ms min: 1.699ms max: 2.383ms)
[0] Results for test_allreduce_recursive_doubling (avg: 1.930ms stdev: 0.160ms min: 1.744ms max: 2.426ms)
[13] Results for test_allreduce_recursive_doubling (avg: 1.926ms stdev: 0.167ms min: 1.705ms max: 2.426ms)
[15] Results for test_allreduce_recursive_doubling (avg: 1.932ms stdev: 0.162ms min: 1.743ms max: 2.424ms)
[2] Results for test_allreduce_recursive_doubling (avg: 1.884ms stdev: 0.163ms min: 1.694ms max: 2.380ms)
[1] Results for test_allreduce_recursive_doubling (avg: 1.930ms stdev: 0.163ms min: 1.739ms max: 2.427ms)
[14] Results for test_allreduce_recursive_doubling (avg: 1.931ms stdev: 0.162ms min: 1.744ms max: 2.428ms)
[9] Results for test_allreduce_recursive_doubling (avg: 1.927ms stdev: 0.164ms min: 1.739ms max: 2.425ms)
[5] Results for test_allreduce_recursive_doubling (avg: 1.931ms stdev: 0.163ms min: 1.741ms max: 2.426ms)
[8] Results for test_allreduce_recursive_doubling (avg: 1.934ms stdev: 0.162ms min: 1.748ms max: 2.425ms)
[11] Results for test_allreduce_recursive_doubling (avg: 1.924ms stdev: 0.164ms min: 1.732ms max: 2.425ms)
[3] Results for test_allreduce_recursive_doubling (avg: 1.884ms stdev: 0.163ms min: 1.695ms max: 2.380ms)
[10] Results for test_allreduce_recursive_doubling (avg: 1.887ms stdev: 0.164ms min: 1.699ms max: 2.381ms)
[7] Results for test_allreduce_recursive_doubling (avg: 1.882ms stdev: 0.165ms min: 1.696ms max: 2.379ms)
Results of running allreduce on std::vector
Test results for test_reduce running on 16 nodes:
[14] Results for test_reduce (avg: 0.147ms stdev: 0.030ms min: 0.128ms max: 0.263ms)
[11] Results for test_reduce (avg: 0.146ms stdev: 0.029ms min: 0.128ms max: 0.260ms)
[15] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.128ms max: 0.264ms)
[0] Results for test_reduce (avg: 0.144ms stdev: 0.029ms min: 0.125ms max: 0.259ms)
[13] Results for test_reduce (avg: 0.147ms stdev: 0.029ms min: 0.127ms max: 0.261ms)
[6] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.127ms max: 0.263ms)
[8] Results for test_reduce (avg: 0.147ms stdev: 0.030ms min: 0.127ms max: 0.268ms)
[9] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.127ms max: 0.265ms)
[4] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.127ms max: 0.267ms)
[10] Results for test_reduce (avg: 0.147ms stdev: 0.030ms min: 0.128ms max: 0.266ms)
[5] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.127ms max: 0.265ms)
[3] Results for test_reduce (avg: 0.145ms stdev: 0.030ms min: 0.127ms max: 0.265ms)
[7] Results for test_reduce (avg: 0.145ms stdev: 0.030ms min: 0.127ms max: 0.263ms)
[2] Results for test_reduce (avg: 0.146ms stdev: 0.029ms min: 0.128ms max: 0.259ms)
[12] Results for test_reduce (avg: 0.147ms stdev: 0.030ms min: 0.129ms max: 0.268ms)
[1] Results for test_reduce (avg: 0.146ms stdev: 0.030ms min: 0.127ms max: 0.262ms)
RUNNING TEST: test_allreduce_rabenseifner (Number of runs = 25) ...
Test results for test_allreduce_rabenseifner running on 16 nodes:
[11] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.011ms min: 0.135ms max: 0.184ms)
[5] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.011ms min: 0.135ms max: 0.183ms)
[13] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.011ms min: 0.135ms max: 0.183ms)
[0] Results for test_allreduce_rabenseifner (avg: 0.141ms stdev: 0.011ms min: 0.133ms max: 0.181ms)
[2] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.011ms min: 0.133ms max: 0.183ms)
[15] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.011ms min: 0.135ms max: 0.184ms)
[6] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.012ms min: 0.134ms max: 0.184ms)
[8] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.012ms min: 0.134ms max: 0.185ms)
[4] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.012ms min: 0.134ms max: 0.183ms)
[3] Results for test_allreduce_rabenseifner (avg: 0.141ms stdev: 0.012ms min: 0.133ms max: 0.183ms)
[7] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.012ms min: 0.134ms max: 0.183ms)
[9] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.011ms min: 0.134ms max: 0.183ms)
[12] Results for test_allreduce_rabenseifner (avg: 0.144ms stdev: 0.011ms min: 0.136ms max: 0.184ms)
[14] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.011ms min: 0.135ms max: 0.184ms)
[1] Results for test_allreduce_rabenseifner (avg: 0.142ms stdev: 0.011ms min: 0.134ms max: 0.182ms)
[10] Results for test_allreduce_rabenseifner (avg: 0.143ms stdev: 0.012ms min: 0.135ms max: 0.184ms)
RUNNING TEST: test_allreduce_recursive_doubling (Number of runs = 25) ...
Test results for test_allreduce_recursive_doubling running on 16 nodes:
[11] Results for test_allreduce_recursive_doubling (avg: 0.117ms stdev: 0.061ms min: 0.092ms max: 0.391ms)
[5] Results for test_allreduce_recursive_doubling (avg: 0.117ms stdev: 0.061ms min: 0.093ms max: 0.391ms)
[13] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.092ms max: 0.391ms)
[0] Results for test_allreduce_recursive_doubling (avg: 0.114ms stdev: 0.061ms min: 0.090ms max: 0.387ms)
[2] Results for test_allreduce_recursive_doubling (avg: 0.115ms stdev: 0.061ms min: 0.090ms max: 0.389ms)
[15] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.092ms max: 0.388ms)
[8] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.091ms max: 0.389ms)
[4] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.091ms max: 0.388ms)
[6] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.091ms max: 0.390ms)
[1] Results for test_allreduce_recursive_doubling (avg: 0.115ms stdev: 0.061ms min: 0.091ms max: 0.387ms)
[14] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.092ms max: 0.390ms)
[7] Results for test_allreduce_recursive_doubling (avg: 0.115ms stdev: 0.061ms min: 0.091ms max: 0.389ms)
[9] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.091ms max: 0.389ms)
[3] Results for test_allreduce_recursive_doubling (avg: 0.115ms stdev: 0.061ms min: 0.090ms max: 0.388ms)
[10] Results for test_allreduce_recursive_doubling (avg: 0.116ms stdev: 0.061ms min: 0.092ms max: 0.389ms)
[12] Results for test_allreduce_recursive_doubling (avg: 0.117ms stdev: 0.061ms min: 0.093ms max: 0.393ms)
Still missing:
Regarding the issue with the Rabenseifner
algorithm, I was thinking maybe we could try to introduce some kind of wrapper for various data types. We could add specializations for known common types (e.g., std::vector
, kokkos::View
, etc.).
If users want to use their custom wrapper, then they should provide size
, at
, set
functions (and probably few more that allow for data splitting). If they want to use the Rabenseifner
algorithm, we could add a constexpr check for that, and if it fails, then we fallback to reduce->bcast
or Recursive Doubling
.
#include <vector>
#ifdef VT_KOKKOS_ENABLED
#include <Kokkos_Core.hpp>
#endif
template <typename Container>
class DataHandler {
public:
using Scalar = float;
static size_t size(const Container& data);
static Scalar& at(Container& data, size_t idx);
static void set(Container& data, size_t idx, const Scalar& value);
static Container split(Container& data, size_t start, size_t end);
};
template <typename T>
class DataHandler<std::vector<T>> {
public:
using Scalar = T;
static size_t size(const std::vector<T>& data) { return data.size(); }
static T at(const std::vector<T>& data, size_t idx) { return data[idx]; }
static T& at(std::vector<T>& data, size_t idx) { return data[idx]; }
static void set(std::vector<T>& data, size_t idx, const T& value) {
data[idx] = value;
}
static std::vector<T> split(std::vector<T>& data, size_t start, size_t end) {
return std::vector<T>{data.begin() + start, data.begin() + end};
}
};
#ifdef VT_KOKKOS_ENABLED
template <typename T, typename... Props>
class DataHandler<Kokkos::View<T*, Props...>> {
public:
static size_t size(const Kokkos::View<T*, Props...>& data) {
return data.extent(0);
}
static T at(const Kokkos::View<T*, Props...>& data, size_t idx) {
return data(idx);
}
static T& at(Kokkos::View<T*, Props...>& data, size_t idx) {
return data(idx);
}
static void
set(Kokkos::View<T*, Props...>& data, size_t idx, const T& value) {
data(idx) = value;
}
};
#endif // VT_KOKKOS_ENABLED
Let's go with the DataHandler
approach.
Fixes #2240