NXP / isochron

Tool for Time Sensitive Networking testing
GNU General Public License v2.0
45 stars 13 forks source link

Resource temporarily unavailable #6

Closed crow1814 closed 3 years ago

crow1814 commented 3 years ago

Hi, Vladimir, Sorry for my gcc version mistake, I have update my gcc, and it‘s done. However, I met new problems. I directly connected the two ports of Intel I210 for testing. First, sync via linuxptp. Second, when I execute ./isochron send --interface enp3s0 --dmac 2c:53:4a:cle-time 400000 --num-frames 10000 --frame-size 64 --client 192.168.1.30 --quiet. in sender The sender's log is Using the kernel UTC-TAI offset which is 0 Now: 1623754160.910871679 Base time: 1623754161.911200000 Cycle time: 0.000400000 Timed out waiting for TX timestamp: -11 (Resource temporarily unavailable) 42 timestamps unacknowledged Collecting receiver stats Summary: Path delay: min -31 max -19 mean -25.122 stddev 2.582, min at seqid 9999, max at seqid 7376 Wakeup to HW TX timestamp: min 273211290 max 273292847 mean 273235250.637 stddev 11467.007, min at seqid 2 HW RX deadline delta (TX time to HW RX timestamp): min 272873202 max 274822181 mean 272903393.118 stddev 3 Application latency budget (HW TX timestamp to TX time): min -274822204 max -272873228 mean -272903418.241id 141 Wakeup latency: min 36410 max 1985452 mean 68167.604 stddev 34167.588, min at seqid 8162, max at seqid 456 Arrival latency (HW RX timestamp to application): min -273189925 max -272748011 mean -273148686.036 stddev HW TX deadline misses: 9958 (100.000%) This is not consistent with OpenILUG_Rev1.9, Is there something wrong here?

vladimiroltean commented 3 years ago
Timed out waiting for TX timestamp: -11 (Resource temporarily unavailable) 42 timestamps unacknowledged

If this is the first time you ran isochron, then the gianfar driver might have needed to enable timestamping, so it temporarily put the interface down and then back up again, losing some packets. If you try again do you still get this issue?

vladimiroltean commented 3 years ago

Ah, this is the Intel i210 card with the igb driver. Can you please show me the output of "dmesg | grep igb"?

crow1814 commented 3 years ago

the output of "dmesg | grep igb" is [ 3.106149] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 3.106151] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.106187] igb 0000:03:00.0: enabling device (0000 -> 0002) [ 3.134064] igb 0000:03:00.0: added PHC on eth0 [ 3.134066] igb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.134069] igb 0000:03:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 2c:53:4a:07:df:04 [ 3.134112] igb 0000:03:00.0: eth0: PBA No: 000400-000 [ 3.134114] igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx que ue(s) [ 3.134161] igb 0000:04:00.0: enabling device (0000 -> 0002) [ 3.161941] igb 0000:04:00.0: added PHC on eth1 [ 3.161942] igb 0000:04:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.161944] igb 0000:04:00.0: eth1: (PCIe:2.5Gb/s:Width x1) 2c:53:4a:07:df:05 [ 3.161983] igb 0000:04:00.0: eth1: PBA No: 000400-000 [ 3.161984] igb 0000:04:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx que ue(s) [ 3.162028] igb 0000:05:00.0: enabling device (0000 -> 0002) [ 3.189935] igb 0000:05:00.0: added PHC on eth2 [ 3.189936] igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.189938] igb 0000:05:00.0: eth2: (PCIe:2.5Gb/s:Width x1) 2c:53:4a:07:df:06 [ 3.189976] igb 0000:05:00.0: eth2: PBA No: 000400-000 [ 3.189978] igb 0000:05:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx que ue(s) [ 3.190020] igb 0000:06:00.0: enabling device (0000 -> 0002) [ 3.217882] igb 0000:06:00.0: added PHC on eth4 [ 3.217883] igb 0000:06:00.0: Intel(R) Gigabit Ethernet Network Connection [ 3.217884] igb 0000:06:00.0: eth4: (PCIe:2.5Gb/s:Width x1) 2c:53:4a:07:df:07 [ 3.217923] igb 0000:06:00.0: eth4: PBA No: 000400-000 [ 3.217924] igb 0000:06:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx que ue(s) [ 3.235854] igb 0000:05:00.0 enp5s0: renamed from eth2 [ 3.270071] igb 0000:03:00.0 enp3s0: renamed from eth0 [ 3.285861] igb 0000:06:00.0 enp6s0: renamed from eth4 [ 3.321650] igb 0000:04:00.0 enp4s0: renamed from eth1 [ 305.524067] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 1000 Mbps Ful l Duplex, Flow Control: RX/TX [ 305.560059] igb 0000:06:00.0 enp6s0: igb: enp6s0 NIC Link is Up 1000 Mbps Ful l Duplex, Flow Control: RX/TX

vladimiroltean commented 3 years ago

Sorry, I don't know what's wrong and I don't have an Intel card to test this. If you increase the cycle-time from the current 400 us to, say, 10 ms, does the problem still occur? If it doesn't, what is the smallest cycle-time with which this issue starts happening? The reason why I asked for the dmesg output is this kernel patch: https://github.com/torvalds/linux/commit/e5f36ad14c93f I thought you were going to have this message in the kernel output: "clearing Tx timestamp hang" but it appears you don't.

crow1814 commented 3 years ago

Thank you for your time! I found some interesting things. I find the error is realted with --num-frames. When I set --cycle-time 400000 --num-frames 10, it works fine. However, the new error is HW TX deadline misses: 10 (100.000%), the log is

Summary:
Path delay: min -26 max -19 mean -22.500 stddev 2.156, min at seqid 4, max at seqid 7
Wakeup to HW TX timestamp: min 772994734 max 773020563 mean 772998611.800 stddev 7337.994, min at seqid 2, max at seqid 1
HW RX deadline delta (TX time to HW RX timestamp): min 772660550 max 772728180 mean 772672066.300 stddev 18849.732, min at seqid 2, max at seqid 1
Application latency budget (HW TX timestamp to TX time): min -772728204 max -772660572 mean -772672088.800 stddev 18850.227, min at seqid 1, max at seqid 2
Wakeup latency: min 65838 max 107641 mean 73477.000 stddev 11540.974, min at seqid 2, max at seqid 1
Arrival latency (HW RX timestamp to application): min -772905932 max -772828178 mean -772858033.700 stddev 24067.965, min at seqid 10, max at seqid 1
HW TX deadline misses: 10 (100.000%)

Then, I checked the package on the receiver via Wireshark, the time interval and the number of packages are correct. I find the minimum is --cycle-time 200000 --num-frames 10, although the misses still exist. Then, I continue to increase the cycle-time, the minimum is --cycle-time 1000000000 --num-frames 10 when everything is ok, and the log is

Summary:
Path delay: min -36 max -7 mean -20.900 stddev 8.526, min at seqid 9, max at seqid 3
Wakeup to HW TX timestamp: min 773270297 max 773330725 mean 773301491.800 stddev 18435.484, min at seqid 1, max at seqid 10
HW RX deadline delta (TX time to HW RX timestamp): min -226646088 max -226558702 mean -226594664.900 stddev 26004.027, min at seqid 2, max at seqid 10
Application latency budget (HW TX timestamp to TX time): min 226558676 max 226646068 mean 226594644.000 stddev 26007.720, min at seqid 10, max at seqid 2
Wakeup latency: min 75122 max 113039 mean 103864.200 stddev 10941.310, min at seqid 2, max at seqid 7
Arrival latency (HW RX timestamp to application): min -773190007 max -773129980 mean -773156063.100 stddev 21389.420, min at seqid 10, max at seqid 1
HW TX deadline misses: 0 (0.000%)

I would like to ask if I want to perform an 802.1Qbv test on ls1021board, is the misses error unaffected? And Can multiple packets be sent per cycle?

vladimiroltean commented 3 years ago

Can multiple packets be sent per cycle?

By design, isochron sends one single packet per cycle. If you have a good reason to do otherwise, patches are welcome.

if I want to perform an 802.1Qbv test on ls1021board, is the misses error unaffected

The gianfar driver for the LS1021A eTSEC is capable of doing hardware TX and RX timestamping at line rate, yes.

Path delay: min -26 max -19 mean -22.500 stddev 2.156, min at seqid 4, max at seqid 7

This looks very strange. You synchronize your sender and receiver using ptp4l, right? Could you share a ptp4l log? The path delay from isochron should match the path delay from ptp4l.

Wakeup to HW TX timestamp: min 772994734 max 773020563 mean 772998611.800 stddev 7337.994, min at seqid 2, max at seqid 1

Damn, that is a huge value (773 ms). Either the system clock is not well synchronized with the PTP hardware clock (isochron treats any synchronization offset within +/- 1 second as acceptable, so this is within the "acceptable" range), or TX timestamping takes very, very long with the i210 card. I guess I should order one and see what goes on with it. When you send traffic with isochron, how do you ensure that ptp4l and phc2sys have locked synchronization? I use something along the lines of:

#!/bin/bash

set -e -u -o pipefail

scrape_logs_for_phc2sys_offset() {
    local awk_program='/phc2sys/ { print $11; exit; }'

    tail -10 /var/log/messages > ptp.log

    echo $(tac ptp.log | awk "${awk_program}")
}

scrape_logs_for_ptp4l_offset() {
    local awk_program='/ptp4l/ { print $11; exit; }'

    tail -10 /var/log/messages > ptp.log

    echo $(tac ptp.log | awk "${awk_program}")
}

check_sync_phc2sys() {
    local threshold_ns=50
    local system_clock_offset=

    while :; do
        sleep 1

        system_clock_offset=$(scrape_logs_for_phc2sys_offset)

        # Got something, is it a number?
        case "${system_clock_offset}" in
        ''|[!\-][!0-9]*)
            if ! pidof phc2sys > /dev/null; then
                echo "Please start the phc2sys service."
                return 1
            else
                echo "Trying again..."
                continue
            fi
            ;;
        esac

        if [ "${system_clock_offset}" -lt 0 ]; then
            system_clock_offset=$((-${system_clock_offset}))
        fi
        echo "System clock offset ${system_clock_offset} ns"
        if [ "${system_clock_offset}" -gt "${threshold_ns}" ]; then
            echo "System clock is not yet synchronized..."
            continue
        fi
        # Success
        break
    done
}

check_sync_ptp4l() {
    local threshold_ns=100
    local phc_offset=

    while :; do
        sleep 1

        phc_offset=$(scrape_logs_for_ptp4l_offset)

        # Got something, is it a number?
        case "${phc_offset}" in
        ''|[!\-][!0-9]*)
            if ! pidof phc2sys > /dev/null; then
                echo "Please start the phc2sys service."
                return 1
            else
                echo "Trying again..."
                continue
            fi
            ;;
        esac

        echo "Master offset ${phc_offset} ns"
        if [ "${phc_offset}" -lt 0 ]; then
            phc_offset=$((-${phc_offset}))
        fi
        if [ "${phc_offset}" -gt "${threshold_ns}" ]; then
            echo "PTP clock is not yet synchronized..."
            continue
        fi
        # Success
        break
    done
}

check_sync_ptp4l
check_sync_phc2sys

[ may not work out of the box for you ]