ORNL-QCI / tnqvm

Tensor Network QPU Simulator for Eclipse XACC
43 stars 10 forks source link

TNQVM with MKL #30

Closed DmitryLyakh closed 4 years ago

DmitryLyakh commented 4 years ago

TNQVM does not work with Intel MKL (at least on some common systems like my Ubuntu 18.04 desktop). The runtime error is below:

tnqvm/tests/ExatnVisitorTester [==========] Running 6 tests from 1 test case. [----------] Global test environment set-up. [----------] 6 tests from ExatnVisitorTester [ RUN ] ExatnVisitorTester.checkExatnVisitor H(Rank:2, Volume: 4): [(0.707107,0)(0.707107,0)(0.707107,0)(-0.707107,0)] CNOT(Rank:4, Volume: 16): [(1,0)(0,0)(0,0)(0,0)(0,0)(1,0)(0,0)(0,0)(0,0)(0,0)(0,0)(1,0)(0,0)(0,0)(1,0)(0,0)] CNOT(Rank:4, Volume: 16): [(1,0)(0,0)(0,0)(0,0)(0,0)(1,0)(0,0)(0,0)(0,0)(0,0)(0,0)(1,0)(0,0)(0,0)(1,0)(0,0)] Q0(Rank:1, Volume: 2): [(1,0)(0,0)] Q1(Rank:1, Volume: 2): [(1,0)(0,0)] Q2(Rank:1, Volume: 2): [(1,0)(0,0)] X(Rank:2, Volume: 4): [(0,0)(1,0)(1,0)(0,0)] H(Rank:2, Volume: 4): [(0.707107,0)(0.707107,0)(0.707107,0)(-0.707107,0)] INTEL MKL ERROR: /home/div/intel/mkl/lib/intel64_lin/libmkl_avx2.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8. Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. Segmentation fault

DmitryLyakh commented 4 years ago

Alex, please add Thien to this issue, I am still unable to add him because github does not recognize his nick "tnguyen-ornl".

DmitryLyakh commented 4 years ago

The problem is likely related to our use of CPP Microservices and dynamic library loading since by itself MKL work perfectly fine (e.g., in standalone ExaTN tests). If we are unable to make MKL work with CPP Microservices, we need to look for an alternative mechanism and some workaround. MKL is a de facto standard linear algebra library on any Intel machine and we cannot afford losing those.

DmitryLyakh commented 4 years ago

One thing I noticed is that the MKL error comes from a missing symbol mkl_sparse_optimize_bsr_trsm_i8, which by its suffix looks like an ILP64 function (i8) whereas we have LP64 built (i4). Not sure why this shows up only when we use rpath and CPP Microservices ...

DmitryLyakh commented 4 years ago

LD_PRELOAD with libmkl_sequential.so resolves the missing symbol, but we are not supposed to use libmkl_sequential.so because we are interested in parallel (multi-threaded) execution. So, we cannot use this as a workaround in production.

DmitryLyakh commented 4 years ago

By the way, the missing mkl_sparse_optimize_bsr_trsm_i8 is contained in libmkl_gnu_thread.so, which means the latter is not loaded properly for some reason.

DmitryLyakh commented 4 years ago

The proper workaround is this (but we still need to understand why this is happening): LD_PRELOAD="/home/div/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/home/div/intel/mkl/lib/intel64/libmkl_gnu_thread.so:/home/div/intel/mkl/lib/intel64/libmkl_core.so:libgomp.so" tnqvm/tests/ExatnVisitorTester

amccaskey commented 4 years ago

No idea how this happening yet, but looks like libiomp5 is being used

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ExatnVisitorTester
[ RUN      ] ExatnVisitorTester.testSimpleGates
     13729: 
     13729: calling init: /home/cades/.exatn/plugins/libexatn-runtime-boost-graph.so
     13729: 
     13729: 
     13729: calling init: /home/cades/.exatn/plugins/libexatn-runtime-executor.so
     13729: 
     13729: 
     13729: calling init: /home/cades/intel/mkl/lib/intel64/../../../compiler/lib/intel64/libiomp5.so
     13729: 
     13729: 
     13729: calling init: /home/cades/intel/mkl/lib/intel64/libmkl_intel_thread.so
     13729: 
     13729: 
     13729: calling init: /home/cades/intel/mkl/lib/intel64/libmkl_avx2.so
     13729: 
     13729: tnqvm/tests/ExatnVisitorTester: error: symbol lookup error: undefined symbol: scalable_malloc (fatal)
     13729: find library=libmemkind.so [0]; searching
     13729:  search path=/home/cades/.xacc/lib:/home/cades/dev/debug_tnqvm/build/tnqvm      (RPATH from file tnqvm/tests/ExatnVisitorTester)
     13729:   trying file=/home/cades/.xacc/lib/libmemkind.so
     13729:   trying file=/home/cades/dev/debug_tnqvm/build/tnqvm/libmemkind.so
     13729:  search cache=/etc/ld.so.cache
     13729:  search path=/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/lib:/usr/lib      (system search path)
     13729:   trying file=/lib/x86_64-linux-gnu/libmemkind.so
     13729:   trying file=/usr/lib/x86_64-linux-gnu/libmemkind.so
     13729:   trying file=/lib/libmemkind.so
     13729:   trying file=/usr/lib/libmemkind.so
     13729: 
[       OK ] ExatnVisitorTester.testSimpleGates (757 ms)
[----------] 1 test from ExatnVisitorTester (757 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (758 ms total)
[  PASSED  ] 1 test.
DmitryLyakh commented 4 years ago

Yeah, that would explain all these issues and segfaults.

amccaskey commented 4 years ago

So we know this bug is related to CppMicroServices + dlopen on the tnqvm-exatn plugin. I was able to track down this stackoverflow post (https://stackoverflow.com/questions/54694862/intel-mkl-and-jni-how-to-add-a-shared-library-that-ld-searches-symbols-from) that describes a similar situation (here they are trying to load an MKL library using Java JNI System.loadLibrary()). They note that JNI by default calls dlopen() with the RTLD_LOCAL flag, which does not make its loaded symbols available globally. Therefore, they see the same library loading issues that we do because the correct symbol cannot be found from libmkl_gnu_thread.so (since it was loaded only locally). To fix this one can use LD_PRELOAD, which is a terrible hack and not good for deployment, OR they can force this library to be loaded with RTLD_GLOBAL, thereby making those symbols available for use by libmkl_avx2.so.

I have tried this with the https://github.com/ORNL-QCI/xacc_application_example code. I can reproduce the same library loading issues by building like this

#include "xacc.hpp"
int main(int argc, char** argv) {

  //Initialize the XACC runtime:
  xacc::Initialize(argc, argv);

  //Choose the desired quantum accelerator:
  auto qpu = xacc::getAccelerator("tnqvm", {std::make_pair("tnqvm-visitor", "exatn")});

  //Choose the desired quantum programming language:
  auto xasmCompiler = xacc::getCompiler("xasm");

  //Compile a quantum kernel into the quantum IR:
  auto ir = xasmCompiler->compile(
  R"(__qpu__ void ansatz(qbit q, double theta) {
    X(q[0]);
    Ry(q[1], theta);
    CX(q[1], q[0]);
    H(q[0]);
    H(q[1]);
    Measure(q[0]);
    Measure(q[1]);
  })", qpu);

  //Get the generated parameterized quantum circuit:
  auto circuit = ir->getComposite("ansatz");

  //Perform quantum/classical computation:
  auto angles = xacc::linspace(-3.1415, 3.1415, 20);
  for (auto & a : angles) {
    auto evaled = (*circuit)({a});
    auto qubits = xacc::qalloc(2);
    qpu->execute(qubits, evaled);
    auto exp_val = qubits->getExpectationValueZ();
    std::cout << "<X0X1>(" << a << ") = " << exp_val << "\n";
  }

  //Finalize the XACC runtime:
  xacc::Finalize();
}

If we add the following before initialization


  void * core_handle = dlopen("/home/cades/intel/mkl/lib/intel64/libmkl_core.so", RTLD_LAZY | RTLD_GLOBAL);
  if (core_handle == nullptr) {
       std::cout << "core nullptr\n";
       std::cout << dlerror() << "\n";
   } 

   void * thread_handle = dlopen("/home/cades/intel/mkl/lib/intel64/libmkl_gnu_thread.so", RTLD_LAZY | RTLD_GLOBAL);
   if (thread_handle == nullptr) {
       std::cout << "thread nullptr\n";
       std::cout << dlerror() << "\n";
   }

the symbols are loaded globally and are able to be found during Exatn initialization and use.

I plan to update the ExatnVisitor::initialize() method with support for this fix in a manner that lets cmake insert the correct library paths at configure/build time (so as to avoid hard-coded paths)