Closed rohany closed 3 years ago
I have a simple code here:
#include <ctf.hpp> #include <chrono> #include <float.h> using namespace CTF; bool mttkrp(int n, World& dw) { int dimt[3] = {n, n, n}; Tensor<double> B(3, false /* is_sparse */, dimt, dw); Matrix<double> A(n, n, dw), C(n, n, dw), D(n, n, dw); B.fill_random((double)0, (double)1); C.fill_random((double)0, (double)1); D.fill_random((double)0, (double)1); auto start = std::chrono::high_resolution_clock::now(); A["il"] = B["ijk"] * C["jl"] * D["kl"]; auto end = std::chrono::high_resolution_clock::now(); auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count(); if (dw.rank == 0) { std::cout << "Execution time: " << ms << " ms." << std::endl; } } int main(int argc, char** argv) { MPI_Init(&argc, &argv); { World dw; for (int i = 0; i < 10; i++) { mttkrp(512, dw); } } MPI_Finalize(); return 0; }
I compiled this by adding to the examples and makefile. After one iteration, it segfaults with this stack trace:
rohany@g0001:~/ctf$ OMP_NUM_THREADS=20 ./bin/mymttkrp Execution time: 1771 ms. [g0001:1784388] *** Process received signal *** [g0001:1784388] Signal: Segmentation fault (11) [g0001:1784388] Signal code: Address not mapped (1) [g0001:1784388] Failing at address: 0x10000000e [g0001:1784388] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fec0a4343c0] [g0001:1784388] [ 1] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0xd3)[0x7fec0a453543] [g0001:1784388] [ 2] ./bin/mymttkrp(+0x13c22)[0x5572e2710c22] [g0001:1784388] [ 3] ./bin/mymttkrp(+0x171ab)[0x5572e27141ab] [g0001:1784388] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fec0a2540b3] [g0001:1784388] [ 5] ./bin/mymttkrp(+0x18dae)[0x5572e2715dae] [g0001:1784388] *** End of error message *** Segmentation fault
Have I done something wrong here, or is this a library bug?
Didn't think missing a return would cause this :(
I have a simple code here:
I compiled this by adding to the examples and makefile. After one iteration, it segfaults with this stack trace:
Have I done something wrong here, or is this a library bug?