Closed jman-88 closed 3 years ago
On Thu, Sep 03, 2020 at 12:08:39AM -0700, JMan wrote:
Will there be an update to the
8021cb-2boards
branch? I am trying to do a CB test with two LS1028A-RDB boards, but I can't get it going by modifying the8021cb-devel
branch.
What do you need to modify? You should be able to just use the json files for board1 and board2 from the 8021cb-devel branch, and disregard the third. The connection would go from board2 swp0 to board1 swp1 and from board2 swp1 to board1 swp0. What is the issue you're facing when trying to do that?
Thanks, -Vladimir
Hi Vladimir
Thanks for the assistance.
I tried as you suggested first, but the boards fail to ping each other. I then went and modified the configs etc. to remove references to board 3, but that also did not work. I tried the provided config again with no luck.
I have swp0 to swp1 on both boards. I can see the IPs and MACs getting assigned to eno2 as expected, but I can't ping 172.15.0.2 from board 1 to board 2 or vice versa. Should there be any output from the script when it is run?
I'm wondering if it is my image build. I used buildroot from OpenIL. The only change I made was to remove linuxptp as it was failing to compile. This was on master branch so the CB support should be in (551cd14)?
Kind regards Jan
The only change I made was to remove linuxptp as it was failing to compile.
wow, this doesn't sound very nice. Let me build the openil master branch and double-check if it works.
So first of all, the linuxptp package built successfully right now, with the master branch at https://github.com/openil/openil/commit/65d832a2ef1479ee5a51e6990a5fcb61ff3dc71f. Could you share an error log?
Ok, so there was indeed a recently introduced incompatibility between the return code of a tsntool command, and what this script expected. I fixed it in here: https://github.com/vladimiroltean/tsn-scripts/commit/6e0e291b1df82613a6a3e9cd358682ad2f78d8cd I could ping fine with that 1-line change. Could you please give it another go?
I did a fresh build on a different machine with master. Basically I get two different issues with the build. First I encounter multiple definition of 'yylloc'
during various times of the build. I managed to circumvent this. Then the PTP build issue. I think the PTP build might be looking for the header files on my system and not inside buildroot, but this is just a guess. I attached some logs for you.
uboot.txt host-dtc.txt linux.txt ptp.txt
The update to the tsn-scripts worked, but when I run iperf and disconnect the swp1 -> swp0 cable, the stream dies. When I disconnect swp0 -> swp1 the stream is fine. It appears that something is still missing. Do you get the same?
Yes, you are correct. I have fixed that. Please re-update the list of kernel patches and try again, it should work fine this time. I will send these patches to OpenIL as soon as possible.
I didn't unfortunately have time to study your compilation issues. What host distribution are you using?
Great. It is working now, but I'm observing some strange behavior. As I understand, I should be able to disconnect either swp0 or swp1 and the stream should not be interrupted as longs as one cable is connected. I observe the following:
Is this expected behavior or am I misunderstanding what CB should be able to do?
I'm on Manjaro Linux.
Have you updated all Linux kernel patches? What if you use ping instead of iperf3, do you ever run into this problem? Have you looked at the patches in detail, this one especially?
Subject: [PATCH 2/2] net: dsa: felix: implement port flushing on
.phylink_mac_link_down
Especially with flow control enabled on both the user port and the CPU
port, it may happen when a link goes down that Ethernet packets are in
flight. In flow control mode, frames are held back and not dropped. When
there is enough traffic in flight (example: iperf3 TCP), then the
ingress port might enter congestion and never exit that state. This is a
problem, because it is the egress port's link that went down, and that
has caused the inability of the ingress port to send packets to any
other port.
The solution is to follow the port flushing procedure from the reference
manual. This ensures that upon detection of link loss, the existing
packets are thrown away and congestion on the ingress port is therefore
avoided.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Is this what's going on, I wonder? If you run ethtool -S swp4 | grep pause
repeatedly (I just want to see if it keeps increasing) after iperf3 freezes and you stop it, what do you see? By any chance, has the port entered congestion and it keeps sending PAUSE frames to ENETC?
I'm on Manjaro Linux.
I don't think OpenIL gets build-tested on that, sorry.
Have you updated all Linux kernel patches?
Yes. I copied those in, removed output-ls1028ardb/build/linux-linux-5.4.y
and built again.
What if you use ping instead of iperf3, do you ever run into this problem?
I tested with ping now and did not manage to replicate the problem.
Have you looked at the patches in detail, this one especially?
I did not look at it before, but this is probably it. I tested now and when the setup fails and I stop iperf, the tx_pause
counter keeps on increasing. When I reconnect the cable, the tx_pause
counter stops.
:( So it looks like I'm missing a case when I need to flush the packets on the egress port when its link goes down. Sorry, I have some other things I need to focus on right now, I'll come back to it at some point. To work around this issue, you can edit this patch: https://github.com/vladimiroltean/tsn-scripts/blob/8021cb-devel/deps/patches/linux/0001-arm64-dts-fsl-ls1028a-rdb-enable-swp5-and-move-NPI-p.patch and remove the "pause" properties from the fixed-link node of the ENETC and of the Felix switch. This will disable flow control on the eno2 <-> swp4 port pair, which in turn will mean that the packets are no longer buffered by the switch when the link falls, but will be dropped instead. Be warned though, the iperf3 throughput without flow control will be worse.
Okay. Thanks for the assistance. As long as there is a work around it is fine for now. Should I close this issue or keep it open?
Let me know if you see the lockups any longer with flow control disabled, and if that confirms my suspicion, you can close it and I'll return to this issue in a few days anyway. I'm trying to upstream the packet flushing logic and will make sure it works properly when I do that. How many times do you need to plug/unplug the cable? For me it worked 3-4 times, haven't tested further.
How many times do you need to plug/unplug the cable?
Sometimes it will break on the first disconnect, but it takes longer usually so it's a bit random. I was probably up to 20 plug/unplugs once before it happened.
I'll test the work around and let you know.
With the work around I'm getting the same behavior. I double checked the device tree changes and ethtool -S swp4 | grep pause
yields 0 and I can see decreased throughput with iperf. Will probably be good if someone else can confirm this.
I'll get back to this at some point as I have other things to deal with. I'll keep this open in case I forget.
Ok, so it isn't completely what I thought. I'll try to come back at it and see what's going on. Keep the ticket open, sure.
What ethtool counter is increasing now, if not tx_pause, though? Where are the packets dropped?
That is a good question because I did not see anything funny going on with the counters of swp4 when the stream breaks.
So there are actually 6 places to check.
Packets must be dropped somewhere, right?
The port activity LEDs stop flashing when it breaks so it probably drops locally.
I'll do some more testing when I have time.
Hi, I am trying to verify TSN features on LS1021A-TSN board. Two hosts are connected through the 1021-TSN board, just like fig27 in
When have you built the OpenIL image, and using what defconfig? In all defconfigs for the LS1021A-TSN board, the BR2_PACKAGE_QORIQ_TSN_SCRIPTS package is enabled.
I have used the default configuration on 1021-TSN board. I would like to ask how to install and use isochron on host1(ubuntu 16.04).
git clone https://github.com/vladimiroltean/tsn-scripts.git
git checkout isochron
make -C isochron
./isochron/isochron --help
Hey @jman-88, FWIW I spent some time today and figured out what was the problem. I was right about the source of the problem, but the port flushing procedure was slightly incorrect and therefore didn't work until I fixed it. I unplugged the cable quite a few tens of times now and traffic was still flowing. Sorry it took so long.
Will there be an update to the
8021cb-2boards
branch? I am trying to do a CB test with two LS1028A-RDB boards, but I can't get it going by modifying the8021cb-devel
branch.