bobzhuyb / ns3-rdma

NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch
GNU General Public License v2.0
249 stars 116 forks source link

how to visualize simulation? #18

Open wangshuaizs opened 6 years ago

wangshuaizs commented 6 years ago

Hi, On ubuntu OS, pyviz and netanim can be used to visualize simualtion. Is there any tool supported by project ns3-rdma to visualize simulation? I have tried to generate .xml file in simulation, and then open this file in Ubuntu, but netanim told me "This XML format is not supported. Minimum Version:3.106" (the verison of netanim I used is 3.107, the version of ns-3 is 3.26). Do you have any suggestion about visualization? thank you in advance!

bobzhuyb commented 6 years ago

I never tried that. Sorry.

wangshuaizs commented 6 years ago

OK, thank you anyway!

I have got another trouble. I run a simulation that server node 0 - 126 connect to a broadcom switch, then server node 0 send 1 packet (pay load size =1000) to the rest of each server node. the result prints some warning: " WARNING: Drop because egress Port buffer full, WARNING: Drop because egress Q buffer full, WARNING: Drop because egress SP buffer full", I expected to see retransmission, but I can not find retransimission in mix.tr.

Even when I increase the number of server nodes to 129, which means that server node 0 will send 1 packet to server node 1 - 128, respectively, the main.exe crashes with error message like “0x0000010000001000 access violation occurs when the reading position.”

Does that mean I can not simulation more than 127 flows from one server simultaneously? I have tried to dig in your source code, but I find nothing to support this assumption. Could you please give me some suggestion? Thank you !

bobzhuyb commented 6 years ago

The main issue is on the switch node, not on the servers/flows.

I hard-coded a max port number of 64 per switch because this is what we had in practice (64-port switches). You may try to raise this. https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.h#L59

Once you raise this, the switch buffer may run out easily -- remember PFC requires certain buffer headroom per port to operate, otherwise PFC cannot prevent packet losses. You may need to reconfigure buffer thresholds/capacity in https://github.com/bobzhuyb/ns3-rdma/blob/master/src/network/model/broadcom-node.cc

If you want to test 128->1 or even more intensive incast, I recommend you to stick with 64-port switches and use multi-hop topology. The congestion point will be at the last hop anyways. Then you don't need to worry about above issues on the switch.

wangshuaizs commented 6 years ago

@bobzhuyb

I tried to create a topology with 2 servers, named server 0 and server 1, connected to each other directly. And server 1 established 200 rdma flows to server 0 at the same time, but visual studio report errors that said memory access violation. Is it a bug?

Thank you!

bobzhuyb commented 6 years ago

I don't remember any hard-coded limitation for the number of flows per server... but I may be wrong. What is the maximum number of flows that does not have this problem? 128? 64?

wangshuaizs commented 6 years ago

@bobzhuyb

In my test, 127 flows are ok, but 128 flows aren't.

hdtjiang commented 6 years ago

the problem is caused by the parameter in point-to-point/model/qbb-net-device.h you will find *static const uint32_t fCnt = 128; // Max number of flows on a NIC, for TX and RX respectively. TX+RX=fCnt2.** And you can increase this.but there is also a problem. when you finished a flow and start a new flow ,you will find this problem will appear again.Because there is none of queue recovery mechanism.

bobzhuyb commented 6 years ago

Thanks @hdtjiang for the explanation. This is indeed something that needs to be improved.

wangshuaizs commented 6 years ago

Thanks @hdtjiang for your reply. I think the parameter in network/utils/broadcom-egress-queue.h should also be increased accordingly:

static const unsigned fCnt = 128; //max number of queues, 128 for NICs