NVIDIA / jetson-rdma-picoevb

Minimal HW-based demo of GPUDirect RDMA on NVIDIA Jetson AGX Xavier running L4T
Other
159 stars 44 forks source link

Test Bandwidth, Why #17

Open pyt-hnu opened 1 year ago

pyt-hnu commented 1 year ago

When i tun test application(rdma-cuda-h2c-perf) and change the "transfer_size", i found the test time are remained constant at around 5000ns. The test size increases by a multiple of 2 from 1024 to 262144.

pyt-hnu commented 1 year ago

Why does the running time stay the same

pateldipen1984-nv commented 1 year ago

The transfer size is divided by the delta it received from the kernel module and IOCTL call. Did you make any changes with the code and where did you make changes, share snippet here for me to better understand.

pyt-hnu commented 1 year ago

This is "rdma-cuda-h2c-perf.cu", and I tested the bandwidth in two ways. One is kernel time, and the other is on the user side. This is the main part of the code I changed. ` struct timespec beg, end; clock_gettime(MYCLOCK, &beg); for (int iter=0; iter<num_write_iters; ++iter) ret = ioctl(fd, PICOEVB_IOC_H2C_DMA, &dma_params); if (ret != 0) { fprintf(stderr, "ioctl(DMA) failed: %d\n", ret); perror("ioctl() failed"); return 1; } clock_gettime(MYCLOCK, &end);

double woMBps;
{
    double byte_count = (double) transfer_size * num_write_iters;
    double dt_ms = (end.tv_nsec-beg.tv_nsec)/1000000.0 + (end.tv_sec-beg.tv_sec)*1000.0;
    double Bps = byte_count / dt_ms * 1e3;
    woMBps = Bps / 1024.0 / 1024.0;
    cout << "write BW: " << woMBps << "MB/s" << endl;
}

tdelta_us = dma_params.dma_time_ns / 1000;    #kernel time
printf("Bps = %d \n",sizeof((double)transfer_size));
printf("Kernel_time = %lf ms \nKernel side write BW = %lf MB/s\n",
    (double)tdelta_us/1000, (double)transfer_size / (double)tdelta_us); `

In this way, when i changed the 'transfer_size' from 1024 to 2G, The timing of the test has not changed.

pyt-hnu commented 1 year ago

To be precise, it is a test time anomaly, which is a set of data results that I tested. The PCIe version we used was 3.0 with 32GB/s bandwidth, but when the test data volume reached 4M, it exceeded our ideal bandwidth. 1697077236251

pateldipen1984-nv commented 11 months ago

humm these (in red) points for that matter the algo used to calculate bandwidth may have some issue.

pyt-hnu commented 11 months ago

Hi @pateldipen1984-nv I haved a driver installation problem.can you help me? I used ubuntu 16.04 to install successfully, but the following situation occurred when I used ubuntu 20.04.

FATAL: parse error in symbol dump file
make[2]: *** [scripts/Makefile.modpost:94:__modpost] error1
make[1]: *** [Makefile:1644:modules] error 2
make[1]: 离开目录“/usr/src/linux-headers-5.4.0-42-generic”
make: *** [Makefile:19:modules] error 2
Alex-czh commented 5 months ago

Hi everyone, @pyt-hnu @pateldipen1984-nv I am following the work. Have the timing question solved? Could you please show the real timing through this project, please? Actually, I want to know whether the project has a good performance. Thanks so much!

pateldipen1984-nv commented 4 months ago

@hiteshkumar-nv : can you check this bandwidth doubt and installation problem?