Closed HaoKang-Timmy closed 2 years ago
That's a good question, they should not in theory.
I see two reasons:
-b 8
by -b 1M
should solve the issue.That's a good question, they should not in theory.
I see two reasons:
- Your performance in unstable. That should be easy to check in the first run (does the performance increase progressively or not).
- The NCCL perf tests increment the offset in the buffer for each test, so since the first test is 8B all the subsequent tests are misaligned hence performance is reduced. If that's the case, replacing
-b 8
by-b 1M
should solve the issue.
Thank you, it seems that when I change 8 to 1M, The result becomes more resonable.
I want to test the bandwidth cost of send and recv. First I type
on my terminal Then the result of two size of bits are
If I type
The result is
Why there are two types of bandwidths of the same input size?