Mellanox / nv_peer_memory

292 stars 60 forks source link

ibv_reg_mr_iova2 failed with error bad address #115

Open ThkerLee opened 11 months ago

ThkerLee commented 11 months ago

evn:

  1. ubuntu2204
  2. cuda_12.2.1_535.86.10
  3. MLNX_OFED_LINUX-23.04-1.1.3.0
  4. nccl2_2.18.3
  5. cx7 fireware 28.37.1014 ib_write_bw is sucessed:

ib_write

nccl topo: nccl topo

error message:

image

This error is happened on mellox cx7

ThkerLee commented 11 months ago

env problem

nnurlan008 commented 6 months ago

Hi @ThkerLee,

I have a similar problem where I need to assign GPU buffer to completion queue in ibv_create_cq. However, it gives me bad address error with GPU address but succeeds with CPU address. Can you please explain how you solved the issue you mentioned?

Many thanks