Intel-BMC / openbmc

Other
127 stars 56 forks source link

Redfish code update sometimes failed on uploading tarball #42

Closed leiyu-bytedance closed 4 years ago

leiyu-bytedance commented 4 years ago

Unexpected behavior you saw When running Redfish code update, sometimes the upload failed.

Expected behavior Redfish code update should "almost always" succeed.

To Reproduce Precondition:

Additional context

Not sure if it's related, as the weired part is that I am using eth0 in this case, while the above error seems to be related to eth1.

yongli3 commented 4 years ago

@leiyu-bytedance Just want to know how easy to reproduce this issue? 1 fail in 10 times? Any abnormal journal logs on the BMC when this issue occurs?

leiyu-bytedance commented 4 years ago

@leiyu-bytedance Just want to know how easy to reproduce this issue? 1 fail in 10 times? Any abnormal journal logs on the BMC when this issue occurs?

I just tried a few code updates and hit the issue twice, so maybe like 2/5.

There is one thing that is confusing:

yongli3 commented 4 years ago

@leiyu-bytedance Just want to know how easy to reproduce this issue? 1 fail in 10 times? Any abnormal journal logs on the BMC when this issue occurs?

I just tried a few code updates and hit the issue twice, so maybe like 2/5.

There is one thing that is confusing:

  • It was using eth0's IP (192.168.1.100) to do the code update,
  • When the issue occurs, kernel reports the above issue on eth1, and the network is not stable. (e.g. the code update fails, and it's very slow to open a page on WebUI from the eth0's IP)
  • If I change to use eth1's IP (192.168.1.101), the code update becomes OK, as well as the WebUI.

It seems that this is a networking issue, instead of the fw update issue(flash erase/write). Suggest to unplug the eth1 network cable, only connected to eth0(it is the BMC dedicated port), then re-test it to narrow down this issue.

leiyu-bytedance commented 4 years ago

It seems that this is a networking issue, instead of the fw update issue(flash erase/write). Suggest to unplug the eth1 network cable, only connected to eth0(it is the BMC dedicated port), then re-test it to narrow down this issue.

Yup, it's more like a networking issue. It may only be triggered by the above case (both eth0 and eth1 are connected), and doing redfish code update may make it easier to reproduce because it transfers large data.

leiyu-bytedance commented 4 years ago

It's not reproduced for a while, but I do not which revision fixes the issue. Close this anyway.