facebook / openbmc

OpenBMC is an open software framework to build a complete Linux image for a Board Management Controller (BMC).
630 stars 280 forks source link

ftgmac error - NO_RXBUF and RPKT_LOST #43

Closed karthikbgt closed 5 years ago

karthikbgt commented 7 years ago

Hi,

I get the attached error when i try to do an SCP of a file via ethernet interface to OpenBMC running in AST2520 from an external server. So far, i have faced this issue whenever i try to do a SCP. ErrorLog.txt

Can you kindly confirm which version of OpenBMC addresses this ftgmac issue.

Kindly note that I get the error in system that is Linux bmc 4.1.15 armv6l GNU/Linux . The file /etc/os-release does not seem to exist in the system. However, the /etc/issue seems to convey as 'OpenBMC Release cmm-v11' ...

Can you kindly provide me some pointers on what should be taken care and steps to overcome this issue.

Thanks in advance, Karthik Balaguru

karthikbgt commented 7 years ago

Hi, Also, it is observed that whenever the ftgmac rx buf exhaustion & packet loss occurs in the CMM's(OpenBMC 4.1.15) management interface, the interface seems to reach a weird state/go for a toss and it does not recover if i try ifdown/ifup or ifconfig eth0 down/up or even for soft reboot. It recovers only if i powercycle the CMM.

Is there any other way to overcome this packet loss ? Is there any way to recover the system without powercycling the CMM ?

The concern is also because the onie(SCM) shall be fetching the installer and image via CMM that is connected to external HTTP server. If there is packet loss , then there will be an impact on the file that is being fetched by onie which i want to avoid and also on the DHCP offer reception...

Any pointers shall be of great help !!

Thanks in advance, Karthik Balaguru

tfangit commented 7 years ago

@karthikbgt Thx for reporting the issue. We do notice such error also during scp. However, we don't see any other side effect other than the error message. How big is the file you are copying?

SCM does not go through the CMM MAC to communicate to outside. There are an internal 16 HW switch connecting SCM to the mgmt port. CMM MAC is just another port on that switch. So, this error message, which comes from the CMM MAC driver, shall not impact SCM networking. Could u provide more details on your concern about SCM networking in this case?

karthikbgt commented 7 years ago

@tfangit Thanks for confirming that the such error are observed while SCP. The size of file that i tried was ~ 409M. Glad to know that this error message will not have any side effect and also it will not impact SCM networking.

williamspatrick commented 7 years ago

@tfangit - the originator first came to openbmc/openbmc. Before I realized what kernel he was running I mentioned that we have still been seeing this in the 4.7 tree, but one of our developers did a significant rewrite of the aspeed network driver for upstream 4.11 and we are backporting to 4.10. With this rewrite we have been able to saturate the NCSI link and get near maximum on the direct phy (10Mb and >90Mb respectively) without these errors or packet loss. I just wanted to keep you apprised of this work.