Closed hecmay closed 3 years ago
Can you describe the current solution first?
I currently have no idea how to solve it... I was able to run it yesterday with exactly the same code without any changes.
We may need to consider a better way to transfer data between the invoking program and host program
Can you describe the current way first?
The current way:
HeteroCL generates host and device code. The input data (passed from python side) is written into the shared memory. Then the host program copies data from shared memory, runs the main logic, and then writes the result back to shared memory (which will be accessible from python side).
Let's try to be more specific about the methods and syscalls we are using for pass the data through shared memory.
So to be clear, we should not use the term "host" in a confusing way. In our runtime system, we have a parent process that executes the HCL program, and a child process that executes the generated codes (including the host code and the device code). And this is our current runtime flow.
data = numpy.random.randint(...)
hcl_data = hcl.asarray(data)
f(hcl_data)
int shmid = shmget(key, data_size, 0666|IPC_CREAT);
void* mem = shmat(shmid, nullptr, 0);
memcpy(mem, hcl_data, data_size);
system("child_program");
memcpy(hcl_data, mem, data_size);
shmdt(mem);
shmctl(shmid, IPC_RMID, nullptr);
new_data = hcl.asnumpy(hcl_data)
I also met this problem. I think we can generate a header file to store the input data, which is simpler and more reusable.
I tried to run the GEMM HBM example on our servers. The compilation works fine, but when executing the binary, the host program crashed with a SegFault.
The Segfault occurred when the program was trying to access data from the shared memory. We may need to consider a better way to transfer data between the invoking program and host program.