BuildIt-lang / net-blocks

MIT License
8 stars 3 forks source link

Some questions about how to obtain cqe from strding RQ. #1

Open Lcc-code opened 4 months ago

Lcc-code commented 4 months ago

Hello, I'm sorry to bother you.

I am a student from China and recently saw your code. I would like to ask you a question:

I saw that you tried to obtain the cqe of strding RQ using ibv_poll_cq before. Can you use ibv_poll_cq to obtain the cqe normally? I have been reporting the issue of IBV_WC_LOC_PROT_ERR locally.

Additionally, it would be somewhat unusual to obtain CQE using your current method, and I am very confused :<

Lcc-code commented 4 months ago

Hi, when I try this "post_wq_recvs" function, I can poll_cq successfully! I will continue to try to understand the acquisition of cqe for string RQ, which may be a bit difficult. Thank you very much.

AjayBrahmakshatriya commented 4 months ago

Hi @Lcc-code,

Thanks for taking the time to use NetBlocks. What you have asked here is a very interesting question and related to how MLX5's Striding RQs work. As far as I understand rdma-core doesn't support Striding RQs. If you actually try to use the ibv_poll_cq, it works fine initially. However, when you try to read a burst of packets, it fails. The method directly reading from the buffers from the mlx5_cqe64 allows you to read a large number of posted packets. With the ibv_poll_cq method you miss packets if more than 128 are enqueued.

AjayBrahmakshatriya commented 4 months ago

Also, just to add, I use this modified version or rdma-core instead of the default one - https://github.com/AjayBrahmakshatriya/rdma-core/tree/mlx5-fix-mprq-wq-post-recv

This has been retrofitted to support striding RQs.

Lcc-code commented 4 months ago

Hi, Thank you for your reply, I'm doing the post recv in this doorbell way std::atomic dbrec = reinterpret_cast<std::atomic >(dma_context->rwq->dbrec); dbrec->store(htobe32(dma_context->sge_idx & 0xffff), std::memory_order_release);

However, I ran into the problem that the current method works fine on a single thread, but on 2 threads or more, cqe->wqe_id and cqe->wqe_counter will produce incorrect return values when the thread receives more than recv depth. However, the above situation does not occur when a new process is started to receive packets.

Do you have a similar situation when multithreading with the current protocol stack? In addition, do you have similar experience? How can I find out?

Thanks again for your reply!