Open ChengjieLi28 opened 1 year ago
Hi team,
I successfully compiled gloo on MacOS by setting USE_LIBUV ON, but when I test the reduce_scatter OP, I found that core dump at runtime.
MacOS
USE_LIBUV ON
reduce_scatter
I use pybind11 to bind python interface, here's the code:
def worker_reduce_scatter(rank): from .. import xoscar_pygloo as xp if rank == 0: if os.path.exists(fileStore_path): shutil.rmtree(fileStore_path) os.makedirs(fileStore_path) else: time.sleep(0.5) context = xp.rendezvous.Context(rank, 3) if system_name == "Linux": attr = xp.transport.tcp.attr("localhost") dev = xp.transport.tcp.CreateDevice(attr) else: attr = xp.transport.uv.attr("localhost") dev = xp.transport.uv.CreateDevice(attr) fileStore = xp.rendezvous.FileStore(fileStore_path) store = xp.rendezvous.PrefixStore(str(3), fileStore) context.connectFullMesh(store, dev) sendbuf = np.array( [i + 1 for i in range(sum([j + 1 for j in range(3)]))], dtype=np.float32 ) print(f'Send buf: {sendbuf}') sendptr = sendbuf.ctypes.data recvbuf = np.zeros(2, dtype=np.float32) recvptr = recvbuf.ctypes.data recvElems = [2, 2, 2] data_size = ( sendbuf.size if isinstance(sendbuf, np.ndarray) else sendbuf.numpy().size ) print(f'Data size: {data_size}') datatype = xp.glooDataType_t.glooFloat32 op = xp.ReduceOp.SUM xp.reduce_scatter(context, sendptr, recvptr, data_size, recvElems, datatype, op) print(f"rank {rank} sends {sendbuf}, receives {recvbuf}") def test_reduce_scatter(): process1 = mp.Process(target=worker_reduce_scatter, args=(0,)) process1.start() process2 = mp.Process(target=worker_reduce_scatter, args=(1,)) process2.start() process3 = mp.Process(target=worker_reduce_scatter, args=(2,)) process3.start() process1.join() process2.join() process3.join()
This test not work on MacOS, but works on Linux.
May I ask that why this happens? Thank you very much.
Hi team,
I successfully compiled gloo on
MacOS
by settingUSE_LIBUV ON
, but when I test thereduce_scatter
OP, I found that core dump at runtime.I use pybind11 to bind python interface, here's the code:
This test not work on MacOS, but works on Linux.
May I ask that why this happens? Thank you very much.