Microsemi / switchtec-kernel

A kernel module for the Microsemi PCIe switch
GNU General Public License v2.0
45 stars 31 forks source link

ntb_hw_switchtec: Fix unable to set mw translation bug. #48

Closed ghost closed 5 years ago

ghost commented 5 years ago

When host receives the link force down message from peer host, the ntb_hw_switchtec driver would reinitialize the shared memory window, and NTB client(e.g. ntb_transport) would set up the translation memory window. Because Switchtec uses same registers to set up memory window, the access to these registers would fail and the translation memory window would not be set up.

To fix it, we should disable link before reinitializing the shared memory window, and enable link after it.

Signed-off-by: Joey Zhang joey.zhang@microchip.com

wesleywesley commented 5 years ago

When host receives the link force down message from peer host, the ntb_hw_switchtec driver would reinitialize the shared memory window, and NTB client(e.g. ntb_transport) would set up the translation memory window. << In case of ntb_hw_switchtec and NTB client (e.g. ntb_transport) setup memory window concurrently,

Because Switchtec uses same registers to set up memory window, the access to these registers would fail << the latter one would failed the MW setup, for the reason of configuring memory windows/requester ID mapping table required to be serialized.

lsgunth commented 5 years ago

This makes no sense to me. Especially this:

Because Switchtec uses same registers to set up memory window

How does the link status flag in memory relate to the switchtec registers?

Logan

ghost commented 5 years ago

Hi Logan,

Sorry for the unclear description. The issue is Step1: Load ntb, switchtec, ntb_hw_switchtec, ntb_transport and ntb_netdev modules on host. Step2: Load ntb, switchtec, ntb_hw_switchtec, ntb_transport and ntb_netdev modules on peer host. Step3: Host receives the link force down message from peer host. ntb_hw_switchteck module calls config_rsvd_lut_win() to reconfigure the shared memory window. Step4: At same time, host receives the check link message from peer host. ntb_hw_switchtec module calls switchtec_ntb_check_link() and set flag link_is_up to 1. Step5: ntb_transport module call switchtec_ntb_mw_set_trans() to configures the transport MW. Step6: Because of the function switchtec_ntb_part_op(LOCK) fails, the switchtec_ntb_mw_set_trans() would not set up the transport MW.

The reason is When function config_rsvd_lut_win() calls switchtec_ntb_part_op() to reconfigure the shared memory window, the function switchtec_ntb_mw_set_trans() calls switchtec_ntb_part_op() to configure the transport memory window. But Switchtec uses same registers to set up LUT window and transport window.

To fix it, we needs the link_is_up flag be 0 before the shared memory window is reconfigured.

Thanks Joey

lsgunth commented 5 years ago

Ah, I see. More of that needs to be in the commit message.

In this case you haven't fixed the problem. You've just made the race less likely. If you have two threads running nothing says the second thread doesn't do the check just before the first thread sets the flag. We either need to use a lock or make sure all the link status checking stuff happens in the same work queue (I think the second choice is better).

If we make check_link_status the work item and remove the work item for force link down it should probably fix this problem.

ghost commented 5 years ago

Hi Logan,

Thanks for the review and comments. I updated the code and commit message. Please help to review it again. In this update, we put all the link status checking stuff in the same work queue. I tested it and It can work.

Regards Joey

wesleywesley commented 5 years ago

System work queue has no promise on the latter work item be executed after the former is done. So there is a case, when the reconfigure lut in progress by first work item, the second one had started to check link state and finally into link up state, sending link up notify to peer. Followed Peer will write the spad, which is part of the self shared lut now in configuring, data lost happen in this case. The result of data lost is to read peer wrote data, 0 will return.

lsgunth commented 5 years ago

Hmmm, it might work if instead of storing a message in sndev, you store the requested link state and a flag to indicate whether a force has occurred.

ghost commented 5 years ago

Hi Logan,

Thanks for the review and suggestion. I add a new message type (MSG_RETRY) to fix this issue. Please help to review it.

Regards Joey

lsgunth commented 5 years ago

Huh? I don't know how you got that adding a message type was the solution.... I mean to store state required for the work queue, and not the message itself. We're trying to avoid the situation caused by dropped messages.

ghost commented 5 years ago

Oh, I see. Thank you very much. I updated it, but there is another issue on ntb_transport. The ntb_transport_link_work would be canceled by ntb_transport_cleanup_work. The ntb_transport_probe() called the ntb_link_enable() and ntb_link_event().

Regards Joey

wesleywesley commented 5 years ago

Is there a contradiction? read peers shared memory while the lut is in configuring: The force link down trigged rsvd LUT 0 configuration take long time(~= .1s by print) but in the procedure of the rsvd LUT 0 config for peer before finish, the peer have started to run enable link, check link, use the LUT 0 to read peer's shared memory(for shared link_sta)

wesleywesley commented 5 years ago

when test with this version, found 2 issues:

  1. the peer do not write the Remote version by write peer spad function, which lead to the host received force link down message, stuck in keep checking the remove version
  2. the peer send 3 messages (force link down, link up, link check), but only 2 work item executed
ghost commented 5 years ago

@wesleywesley The reason of issue 1 is the ntb_transport_link_work canceled by ntb_transport_cleanup_work. For the issue 2, the link up work is not be executed.

Thanks Joey

wesleywesley commented 5 years ago

@JoeyZhang-Microsemi Let's say host A: receiver and host B: sender of FORCE LINK DOWN message

  1. For issue 1, A is keeping stuck in waiting B's peer spad write when the value of "remote version" stay 0 the questions need to get clear are: 1a.1: what kind of corner case(probably introduce in this patch when convey link up/down status info to upper lever ntb client: from combination/cooperate of ntb callbacks ops "enable link", "link is up", and ntbhw's link state state machine by the work items), lead to transport driver cancel ntb_transport_link_work? 1a.2: even the ntb_transport_link_work is canceled, but transport should schedule the work back again when ntbhw call ntb_link_event, but here not.

  2. For issue 2, 3 work item scheduled, by only 2 executed, what's the reason?

regards, Wesley

ghost commented 5 years ago

@wesleywesley For issue 1, I have fixed it. The NTB API ntb_link_enable() should be set the flag ntb_is_up when it returns. For issue 2, I did not know the reason. But we call schedule_work() twice in very short time. It may be the reason.

Please help to review the change.

ghost commented 5 years ago

@lsgunth Thanks for the comments. I updated the code. Because the NTB API ntb_link_enable() needs update the flag link_is_up before it returns. In the change, the switchtec_ntb_link_enable/disable() directly call the link state checking stuff.

wesleywesley commented 5 years ago

@JoeyZhang-Microsemi had tested on both xlink and non-xlink NTB setup w/o issue, merged into devel.