Open Delaunay opened 3 months ago
Seems some primitive are not implemented for HPUs
dlrm.0 AttributeError: module 'torch._C' has no attribute '_broadcast_coalesced'
dlrm.0 [stderr] Traceback (most recent call last):
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/bin/voir", line 8, in <module>
dlrm.0 [stderr] sys.exit(main())
dlrm.0 [stderr] File "/home/sdp/voir/voir/cli.py", line 124, in main
dlrm.0 [stderr] ov(sys.argv[1:] if argv is None else argv)
dlrm.0 [stderr] File "/home/sdp/voir/voir/phase.py", line 331, in __call__
dlrm.0 [stderr] self._run(*args, **kwargs)
dlrm.0 [stderr] File "/home/sdp/voir/voir/overseer.py", line 242, in _run
dlrm.0 [stderr] set_value(func())
dlrm.0 [stderr] File "/home/sdp/voir/voir/scriptutils.py", line 37, in <lambda>
dlrm.0 [stderr] return lambda: exec(mainsection, glb, glb)
dlrm.0 [stderr] File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 1911, in <module>
dlrm.0 [stderr] run()
dlrm.0 [stderr] File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 1579, in run
dlrm.0 [stderr] Z = dlrm_wrap(
dlrm.0 [stderr] File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 146, in dlrm_wrap
dlrm.0 [stderr] return dlrm(X.to(device), lS_o, lS_i)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1514, in
_wrapped_call_impl
dlrm.0 [stderr] return self._call_impl(*args, **kwargs)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1523, in _call_impl
dlrm.0 [stderr] return forward_call(*args, **kwargs)
dlrm.0 [stderr] File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 530, in forward
dlrm.0 [stderr] return self.parallel_forward(dense_x, lS_o, lS_i)
dlrm.0 [stderr] File "/home/sdp/milabench/benchmarks/dlrm/dlrm/dlrm_s_pytorch.py", line 631, in parallel_forward
dlrm.0 [stderr] self.bot_l_replicas = replicate(self.bot_l, device_ids)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/replicate.py", line 110, in replicate
dlrm.0 [stderr] param_copies = _broadcast_coalesced_reshape(params, devices, detach)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/replicate.py", line 83, in
_broadcast_coalesced_reshape
dlrm.0 [stderr] tensor_copies = Broadcast.apply(devices, *tensors)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
dlrm.0 [stderr] return super().apply(*args, **kwargs) # type: ignore[misc]
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/_functions.py", line 23, in forward
dlrm.0 [stderr] outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
dlrm.0 [stderr] File "/home/sdp/results/venv/torch/lib/python3.10/site-packages/torch/nn/parallel/comm.py", line 57, in
broadcast_coalesced
dlrm.0 [stderr] return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
dlrm.0 [stderr] AttributeError: module 'torch._C' has no attribute '_broadcast_coalesced'
The idea is to replace all mention to
torchcompat.core
which mirrortorch.cuda
for many devices (cuda, XPU, HPU)