cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

Shmids expired before usage #241

Closed hecmay closed 3 years ago

hecmay commented 4 years ago

I tried to run the GEMM HBM example on our servers. The compilation works fine, but when executing the binary, the host program crashed with a SegFault.

[INFO] Running commands:
cd project; make host

basename: missing operand
Try 'basename --help' for more information.
host.cpp: In function ‘int main(int, char**)’:
host.cpp:104:11: warning: unused variable ‘_top’ [-Wunused-variable]
   int32_t _top;
           ^~~~
[INFO] Commands outputs:
g++ -I./ -I/opt/xilinx/xrt/include -I/work/shared/common/Xilinx/Vivado/2019.2/in
clude -Wall -O0 -g -std=c++11 -fmessage-length=0 .//xcl2.cpp host.cpp  -o 'host'
  -L/opt/xilinx/xrt/lib -lOpenCL -lpthread  -lrt -lstdc++

[11:20:20] Hash macthed. Found pre-compiled bitstream
[INFO] Running commands:
cd project; ./host kernel.xclbin

/bin/sh: line 1: 160930 Segmentation fault      ./host kernel.xclbin

The Segfault occurred when the program was trying to access data from the shared memory. We may need to consider a better way to transfer data between the invoking program and host program.

zhangzhiru commented 4 years ago

Can you describe the current solution first?

hecmay commented 4 years ago

I currently have no idea how to solve it... I was able to run it yesterday with exactly the same code without any changes.

zhangzhiru commented 4 years ago

We may need to consider a better way to transfer data between the invoking program and host program

Can you describe the current way first?

hecmay commented 4 years ago

The current way:

HeteroCL generates host and device code. The input data (passed from python side) is written into the shared memory. Then the host program copies data from shared memory, runs the main logic, and then writes the result back to shared memory (which will be accessible from python side).

zhangzhiru commented 4 years ago

Let's try to be more specific about the methods and syscalls we are using for pass the data through shared memory.

seanlatias commented 4 years ago

So to be clear, we should not use the term "host" in a confusing way. In our runtime system, we have a parent process that executes the HCL program, and a child process that executes the generated codes (including the host code and the device code). And this is our current runtime flow.

  1. The user prepares data with Numpy
    data = numpy.random.randint(...)
  2. The data is used by HCL runtime with our API
    hcl_data = hcl.asarray(data)
    f(hcl_data)
  3. The HCL runtime creates a shared memory between the parent and child processes.
    int shmid = shmget(key, data_size, 0666|IPC_CREAT);
    void* mem = shmat(shmid, nullptr, 0);
  4. The HCL runtime copies the data to the shared memory
    memcpy(mem, hcl_data, data_size);
  5. The HCL runtime executes the child program, which reads/writes the data from/to the shared memory
    system("child_program");
  6. The HCL runtime copies the updated data and free the shared memory
    memcpy(hcl_data, mem, data_size);
    shmdt(mem);
    shmctl(shmid, IPC_RMID, nullptr);
  7. Users can retrieve the data back in Numpy format
    new_data = hcl.asnumpy(hcl_data)
chhzh123 commented 4 years ago

I also met this problem. I think we can generate a header file to store the input data, which is simpler and more reusable.