Open save-buffer opened 4 years ago
Happens at the start up of every regression test (as far as I can tell)
Hi, Sasha, I guess this warning is not too serious. It indicates that the host is ejecting request packets to some node(s) at a rate greater than that of the node can give response back, so that the out_credits of the manycore_link become 0, and trigger this warning.
Hi, that is not exactly what the warning is saying. The parameter sets how many stores can be sent without hearing back a response. The further away the destination, the longer the latency to get there, so the more words you would expect to send before hearing back. The assert is saying the value is likely set too small. In cases of a congested network, increasing the value can result in more congestion since you are pushing more packets into the network. But I think generally with the host this is not a problem, since it is just one node on the network. What is the current setting of the value?
On Wed, Oct 30, 2019 at 9:55 PM leonardxiang notifications@github.com wrote:
Hi, Sasha, I guess this warning is not too serious. It indicates that the host is ejecting request packets to some node(s) at a rate greater than that of the node can give response back, so that the out_credits of the manycore_link become 0, and trigger this warning.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_f1/issues/449?email_source=notifications&email_token=AEFG5AHTYLCLZSX56CS5IHLQRJQNTA5CNFSM4JHCKTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWR6GY#issuecomment-548216603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5AEE6LBF2XKLWFZ6IZ3QRJQNTANCNFSM4JHCKTDA .
cases
Hi, Michael, the out_credits is set to 16 now.
Yes, the longer the distance from the host to the node, the longer the request-response round trip cycle time, and this would cause out of credits.
The above was another theory: Since the warning occurs at the beginning of the simulation. We write the program tile by tile, starting from (1, 0). I guess it be possible that one tile can not response faster than the incoming store requests ?
I will post the details later...
See inline!
On Thu, Oct 31, 2019 at 2:27 PM leonardxiang notifications@github.com wrote:
cases
Hi, Michael, the out_credits is set to 16 now.
I think it’s definitely worth increasing the credits and seeing when the execution time stops improving. And also computing the bandwidth delay product; see below.
16 needs to cover the round trip delay, so it is saying that we think that the tile is less than 8 cycles away, which I think is probably not true. But you are right that if there is a bottleneck somewhere, the right number of credits could be lower.
Yes, the long distance from the host to the node, the longer the
request-response round trip cycle time, and this would cause out of credits.
The above was another theory: Since the warning occurs at the beginning of the simulation. We write the program tile by tile, start from (1, 0). I guess it be possible that one tile can not response fast enough than the incoming store request ?
Interesting hypothesis! One way to bound the credits parameter is to look at the bandwidth X delay product of that path when the network is not loaded. The bandwidth (in words per cycle) is the rate of the narrowest bottleneck. So for example, if the tile can only accept one word every other cycle, then it would be 1 word / 2 cycles. The other parameter is the round trip delay, in cycles, of the packet going from the sender to the endpoint and the credit returning back to the sender. (if you have multiple clocks, you can use ns instead of cycles). Multiply those together, and there is no benefit to setting the parameter higher. Generally we want to avoid allowing injecting packets into the network if they do not improve performance, because congestion will go up and performance will not improve. It may be that we want to set the parameter lower than this number if we expect congestion to be high in the network. So for another example, if we have a bottleneck that allows one packet every hundred cycles, and then there is 100 cycles of latency after that bottleneck, and then the return path of the credit is symmetric, then we would want 1/100 [(100+100) 2] = 4 credits.
In this case, if there was a FIFO with 100 elements of space before the bottleneck, then it is true that we would see a bunch of out of credit warnings and that would be totally okay because just because we can fill up that FIFO, it only adds more congestion.
One interesting thing would be for @bornaehsani to make Icache write stalls take priority over synchronization stalls in the blood graph. Then we can see what the bandwidth is of that path by seeing the spacing between the writers during code loading time.
I will post the details later...
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_f1/issues/449?email_source=notifications&email_token=AEFG5AEDFNFOB5IPMLEGM7DQRNEULA5CNFSM4JHCKTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECZJWPY#issuecomment-548576063, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5AAMMVXI4NSEXUUTXHDQRNEULANCNFSM4JHCKTDA .
Can we close this? I believe this is just a warning that occurs because it is an artifact of our clock-switching in Cosimulation, or is there a concrete issue we want to track?
I think it’s a concrete issue, but doesn’t need to be looked at until the next iteration of “how can we improve cosim performance ?”
On Mon, Dec 2, 2019 at 3:34 PM Dustin Richmond notifications@github.com wrote:
Can we close this? I believe this is just a warning that occurs because it is an artifact of our clock-switching in Cosimulation, or is there a concrete issue we want to track?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_f1/issues/449?email_source=notifications&email_token=AEFG5AEUQ6FXU5RYC3EHDTLQWWLOZA5CNFSM4JHCKTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFXLSEA#issuecomment-560904464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ACHQECSAO3JWCNTMA3QWWLOZANCNFSM4JHCKTDA .
Maybe next step is for Borna to make the suggested hack to bloodgraphs?
On Mon, Dec 2, 2019 at 4:53 PM Michael Nguyen Taylor < michael.b.taylor@gmail.com> wrote:
I think it’s a concrete issue, but doesn’t need to be looked at until the next iteration of “how can we improve cosim performance ?”
On Mon, Dec 2, 2019 at 3:34 PM Dustin Richmond notifications@github.com wrote:
Can we close this? I believe this is just a warning that occurs because it is an artifact of our clock-switching in Cosimulation, or is there a concrete issue we want to track?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bespoke-silicon-group/bsg_f1/issues/449?email_source=notifications&email_token=AEFG5AEUQ6FXU5RYC3EHDTLQWWLOZA5CNFSM4JHCKTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFXLSEA#issuecomment-560904464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEFG5ACHQECSAO3JWCNTMA3QWWLOZANCNFSM4JHCKTDA .
This might be solved in #503?
No, the warning still exists. Maybe solved after issue #510
Ah, yes. @leonardxiang, can you tell us how many manycore cycles it takes to inject a packet into a network?
In the io clock domain, It takes 30 cycles in total to send a packet to the network. 7.5*4 = 30, which means 7~8 cycles for each host AXIL write (4 AXIL writes make a manycore network packet). Note that the new MCL only needs 2 cycles to finish a AXIL write though, there are extra cycles (5.5) introduced by the host function call.
In the manycore network domain, the packet is injected to the network 1 per cycle. Because we boost the io interface 50X.
Sorry, I'm confused.
Are you reporting 30 cycles for the older interface?
It is the new one, The older interface takes ~116 cycles.
I'm confused about the difference between 2 cycles for a write request and 30 cycles to send a packet.
Oh, you're saying each write request is 2 cycles? And there are 4 write requests to make a packet?
So with a 50:1 clock ratio (What I believe is currently configured) are we injecting a packet onto the network on every network cycle?
Yeah, 4 AXIL writes make a manycore network packet.
And on network clock domain, it is 1 manycore packet injected per cycle.
With the 50:1 ratio? Can you try 30:1 and 25:1? Just curious where the break-even point is
It is 30:1, any ratio greater than 30 will cause the above performance problem warning.
What makes you say that? I see a 50:1 ratio in cl_manycore.sv
Well, I changed that and run some io tests, like test_manycore_credits, test_manycore_eva_read_write, then I got that observation.
If the throughput of mcl is not larger than that of the network, we could expect that the the io will not run out of credits?
Ahhh. Gotcha. Is there a difference in runtime between 30:1 and 1:1?
The runtime of test_manycore_eva_read_wite has about 1.7X slow down if the ratio is changed from 30:1 to 1:1.
But I still think the slow down of runtime can be compensated by removing the cdc module.
You can have a more extensive summary like what is nicely reported on issue #510 ;)
No, that's sufficient for now. Thanks!