Avnu / OpenAvnu

OpenAvnu - an Avnu sponsored repository for Time Sensitive Network (TSN and AVB) technology
470 stars 289 forks source link

simple_talker : igb_xmit doesn't seem to transmit anything #479

Closed wmoors closed 8 years ago

wmoors commented 8 years ago

I started to study AVB through the openAVB stack and try to run the simple_talker and simple_listener on 2 separate machines both equipped with an Intel I210 but I can't get it working.

The stack on both machines is built with travis.sh.

On the first machine:

    $> sudo ./run_daemons.sh enp2s0
    $> sudo ./run_simple_talker enp2s0`

On the second machine:

    $> sudo ./run_daemons.sh enp2s0
    $> cd examples/simple_listener
    $> sudo ./run_simple_listener -i enp2s0 -f output.wav 

However, the output.wav always stays at 44bytes. So I started sniffing, but could not see any 1722 frames whatsoever.

I attached a debugger to the simple_talker and put a breakpoint at the igb_xmit line (l756). This call returns no error but I still can't see any 1722 frame coming out.

Does this ring a bell with anyone?

additional info :

    $> ethtool -i enp2s0 
    driver: igb_avb
    version: 5.3.2_AVB
    firmware-version: 3.16, 0x800004ff, 1.304.0
    expansion-rom-version: 
    bus-info: 0000:02:00.0
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: no
    >$
    >$ ethtool -k enp2s0
    Features for enp2s0:
    rx-checksumming: on
    tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
    scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
    tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
    udp-fragmentation-offload: off [fixed]
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: off
    rx-vlan-offload: on
    tx-vlan-offload: on
    ntuple-filters: off [fixed]
    receive-hashing: on
    highdma: on [fixed]
    rx-vlan-filter: on [fixed]
    vlan-challenged: off [fixed]
    tx-lockless: off [fixed]
    netns-local: off [fixed]
    tx-gso-robust: off [fixed]
    tx-fcoe-segmentation: off [fixed]
    tx-gre-segmentation: off [fixed]
    tx-ipip-segmentation: off [fixed]
    tx-sit-segmentation: off [fixed]
    tx-udp_tnl-segmentation: off [fixed]
    fcoe-mtu: off [fixed]
    tx-nocache-copy: off
    loopback: off [fixed]
    rx-fcs: off [fixed]
    rx-all: off [fixed]
    tx-vlan-stag-hw-insert: off [fixed]
    rx-vlan-stag-hw-parse: off [fixed]
    rx-vlan-stag-filter: off [fixed]
    l2-fwd-offload: off [fixed]
    busy-poll: off [fixed]
    hw-tc-offload: off [fixed]
pinealservo commented 8 years ago

Could you give us the console output on the talker and listener after you run the commands? It's probably stuck on SRP negotiation. Try a single Ctrl-C on the talker side, which seems to cancel the wait on the SRP thread. I haven't looked deeply into this, though.

An alternative to that (which might work better) is to use the srp command line client to make a domain declaration from one of the endpoints, which ought to trigger both of them into action. If you've got an AVB-capable switch, that would do the trick as well, since it will make the default SRP Domain declaration as well.

wmoors commented 8 years ago

It doesn't seem like it hangs. (But I have seen that behavior once) I'm thinking it might be my setup.

I tried 2 different topo's.

first machine (simple_talker) :

$> sudo ./run_simple_talker.sh enp2s0
attaching to 0000:03:00.0
detected domain Class A PRIO=0 VID=0000...
VNE 0000 R=000000000000 VN/MT
 unhandled from mrpd
advertising stream ...
awaiting a listener ...
VJO 0000 R=a0369fac85c0 QA/IN
 unhandled from mrpd
VJO 0000 R=a0369fac85c0 VP/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
got a listener ...
ml_phoffset = 21001981, ls_phoffset = 1852620402
ml_freqffset = 0.999999, ls_freqoffset = 0.999970
VNE 0000 R=a0369fac85c0 QA/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VNE 0000 R=a0369fac85c0 QA/IN
 unhandled from mrpd
VJO 0000 R=a0369fac85c0 QA/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 VP/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 VP/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 QA/IN
 unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a leave indication
listener left
listener left ...

second machine (simple_listener) :

$> sudo ./simple_listener -i enp2s0 -f output.wav
Msg: D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN
T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900,B=0000000000000000,C=0 R=a0369fac90b5 VO/IN

Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN

Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN

detected domain Class A PRIO=0 VID=0000...
Joing VLAN V++:I=0000

Waiting for talker...
Msg: VNE 0000 R=000000000000 VN/MT

Msg: SNE L:D=2,S=a0369fac90b50000 R=000000000000 VN/MT

Created file called output.wav
Msg: VJO 0000 R=a0369fac90b5 QA/IN

Msg: VJO 0000 R=a0369fac90b5 VP/IN

Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN

Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 VP/IN

Msg: VJO 0000 R=a0369fac90b5 QA/IN

Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN

Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN

Msg: VJO 0000 R=a0369fac90b5 QA/IN

Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN

Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 VP/IN

Msg: VJO 0000 R=a0369fac90b5 VP/IN

Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN

Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN

^CReceived signal 2:leaving...
Msg: SLE L:D=2,S=a0369fac90b50000 R=000000000000 LA/MT
eric-mann commented 8 years ago

If I were to bet a donut, I would guess you have the igb driver loaded in the background and the igb_avb driver loaded (?). The xmit routine won't work if the device underneath is reset either by interface up/downed by a dhcp client or internal igb watchdog event from underneath the AVB stack.

wmoors commented 8 years ago

I did have the NIC's still configured for DHCP on both test machines. But even with all IPv4 settings set to disabled (which I figure means that the network manager does not control them any more) the result is the same.

lsmod shows only the igb_avb module loaded...?

$> lsmod | grep igb
igb_avb               196608  0
dca                    16384  1 igb_avb
ptp                    20480  1 igb_avb
eric-mann commented 8 years ago

check the system log with dmesg and see if there are any interesting messages from igb_avb ... like 'Tx Timeout' ...

eric-mann commented 8 years ago

Oh ... and the PTP daemon is running and synchronized of the same interface first, right? IIRC, the clock used to pace out the packets in simple_talker isn't started unless the PTP stack activates the underlying LAN hw PTP timestamping functionality ... happens most often to people who have multiple interfaces enabled ...

wmoors commented 8 years ago

I checked dmesg output but there's no suck message. Everything seems just fine.

[23186.887013] igb_avb: module verification failed: signature and/or required key missing - tainting kernel
[23186.887651] Intel(R) Gigabit Ethernet Network Driver - version 5.3.2_AVB
[23186.887652] Copyright (c) 2007-2015 Intel Corporation.
[23186.972170] igb_avb 0000:02:00.0: added PHC on eth0
[23186.972173] igb_avb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
[23186.972174] igb_avb 0000:02:00.0: eth0: (PCIe:2.5GT/s:Width x1) 
[23186.972176] igb_avb 0000:02:00.0 eth0: MAC: a0:36:9f:ac:90:b5
[23186.972395] igb_avb 0000:02:00.0: eth0: PBA No: G65055-010
[23186.973309] igb_avb 0000:02:00.0 enp2s0: renamed from eth0
[23186.979445] igb_avb 0000:02:00.0: LRO is disabled
[23186.979449] igb_avb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[23186.989616] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[23187.833289] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[23187.833656] igb_avb 0000:03:00.0: added PHC on eth0
[23187.833658] igb_avb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection
[23187.833660] igb_avb 0000:03:00.0: eth0: (PCIe:2.5GT/s:Width x1) 
[23187.833661] igb_avb 0000:03:00.0 eth0: MAC: a0:36:9f:ac:91:d2
[23187.833962] igb_avb 0000:03:00.0: eth0: PBA No: G65055-010
[23187.839629] igb_avb 0000:03:00.0: LRO is disabled
[23187.839633] igb_avb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[23187.844519] igb_avb 0000:03:00.0 enp3s0: renamed from eth0
[23187.868218] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[23188.717202] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[23190.724550] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[23190.724910] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[23232.273790] bind to iface 0000:03:00.0
[23232.273794] checking against adapter name 0000:02:00.0
[23232.273796] checking against adapter name 0000:03:00.0
[23258.107052] bind to iface 0000:03:00.0
[23258.107055] checking against adapter name 0000:02:00.0
[23258.107057] checking against adapter name 0000:03:00.0
[23493.697691] bind to iface 0000:03:00.0
[23493.697699] checking against adapter name 0000:02:00.0
[23493.697702] checking against adapter name 0000:03:00.0
[84390.902361] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[84402.431079] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[84407.318983] igb_avb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[84407.319297] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[84467.990914] bind to iface 0000:03:00.0
[84467.990920] checking against adapter name 0000:02:00.0
[84467.990922] checking against adapter name 0000:03:00.0
[84481.837326] bind to iface 0000:03:00.0
[84481.837329] checking against adapter name 0000:02:00.0
[84481.837331] checking against adapter name 0000:03:00.0
[84512.470146] igb_avb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Down
[84708.377834] device enp2s0 entered promiscuous mode
[84731.985564] device enp2s0 left promiscuous mode
[85834.485534] device enp2s0 entered promiscuous mode
[85846.679189] device enp2s0 left promiscuous mode
[85859.793486] device enp2s0 entered promiscuous mode
[85886.533777] device enp2s0 left promiscuous mode
[86612.409588] device enp2s0 entered promiscuous mode
[86716.947215] device enp2s0 left promiscuous mode

I'm not sure about the gPTP daemons output though. I get a lot of LINKUP status messages. Is that normal behaviour?

PC1:

sudo ./run_daemons.sh enp2s0
Starting daemons on enp2s0
rmmod: ERROR: Module igb is not currently loaded
insmod: ERROR: could not insert module kmod/igb/igb_avb.ko: File exists
groupadd: group 'ptp' already exists
INFO     : GPTP [10:20:50:080] gPTP starting
priority1 = 248
announceReceiptTimeout: 3
syncReceiptTimeout: 3
phy_delay_gb_tx: 184
phy_delay_gb_rx: 382
phy_delay_mb_tx: 1044
phy_delay_mb_rx: 2133
neighborPropDelayThresh: 2500
syncReceiptThreshold: 8
Using clock device: /dev/ptp0
STATUS   : GPTP [10:20:50:081] Starting PDelay

SENT MAAP_PROBE 
STATUS   : GPTP [10:20:50:115] AsCapable: Enabled
process_events()

SENT MAAP_PROBE 

SENT MAAP_PROBE 
STATUS   : GPTP [10:20:51:669] Switching to Slave
STATUS   : GPTP [10:20:51:669] New Grandmaster "A0:36:9F:FF:FE:AC:85:C0" (previous "00:00:00:00:00:00:00:00")

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_ANNOUNCE 

ANNOUNCED ADDRESS:
0x91 0xe0 0xf0 0 0xf0 0xa7 

STATE change to DEFEND:
STATUS   : GPTP [10:21:16:600] LINKUP
Using clock device: /dev/ptp0
STATUS   : GPTP [10:21:18:815] LINKUP
Using clock device: /dev/ptp0
STATUS   : GPTP [10:21:18:822] LINKUP
Using clock device: /dev/ptp0
STATUS   : GPTP [10:21:29:909] LINKUP
Using clock device: /dev/ptp0

STATUS   : GPTP [10:21:45:532] LINKUP
Using clock device: /dev/ptp0
STATUS   : GPTP [10:22:03:853] LINKUP
Using clock device: /dev/ptp0
STATUS   : GPTP [10:22:03:866] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
STATUS   : GPTP [10:22:48:855] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
STATUS   : GPTP [10:24:09:900] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
STATUS   : GPTP [10:24:47:559] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
STATUS   : GPTP [10:25:15:342] LINKUP
Using clock device: /dev/ptp0
sudo ./stop_daemons.sh
Exiting on 15

PC2:

INFO     : GPTP [10:20:48:619] gPTP starting
priority1 = 248
announceReceiptTimeout: 3
syncReceiptTimeout: 3
phy_delay_gb_tx: 184
phy_delay_gb_rx: 382
phy_delay_mb_tx: 1044
phy_delay_mb_rx: 2133
neighborPropDelayThresh: 2500
syncReceiptThreshold: 8
Using clock device: /dev/ptp0
STATUS   : GPTP [10:20:48:619] Starting PDelay

SENT MAAP_PROBE 
process_events()

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 
STATUS   : GPTP [10:20:50:662] AsCapable: Enabled

SENT MAAP_PROBE 

SENT MAAP_PROBE 
STATUS   : GPTP [10:20:51:619] *** Announce Timeout Expired - Becoming Master
STATUS   : GPTP [10:20:51:619] New Grandmaster "A0:36:9F:FF:FE:AC:85:C0" (previous "00:00:00:00:00:00:00:00")

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_PROBE 

SENT MAAP_ANNOUNCE 

ANNOUNCED ADDRESS:
0x91 0xe0 0xf0 0 0xd2 0xee 

STATE change to DEFEND:

SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
sudo ./stop_daemons.sh 
Exiting on 15
wmoors commented 8 years ago

ok... I found the problem. Everything works now.

It was the PCI address msg in the simple_talker log that was a big giveaway. What I failed to mention was that I have 2 I210 cards in each system. And even though I gave the correct interface name when starting the simple_talker, it used the wrong PCI address.

I haven't fixed it yet though, I just tested it with the extra card removed from the system running the talker.

MantisClone commented 8 years ago

I've seen at least 3 people get tripped up by this same issue when using the simple_talker and simple_listener examples for the first time, myself included. pci_connect needs to be reworked to connect to the appropriate PCI device, given an interface name so that using multiple I210 cards won't be so hard to get started.

I don't know how to correlate the interface name to the PCI bus number, so the solution we used was a dumb lookup table in pci_connect which is not a portable solution.

On Fri, Sep 16, 2016, 9:59 AM wmoors notifications@github.com wrote:

ok... I found the problem. Everything works now.

It was the PCI address msg in the simple_talker log that was a big giveaway. What I failed to mention was that I have 2 I210 cards in each system. And even though I gave the correct interface name when starting the simple_talker, it used the wrong PCI address.

I haven't fixed it yet though, I just tested it with the extra card removed from the system running the talker.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AVnu/Open-AVB/issues/479#issuecomment-247638648, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaeYWl_P3UaI-37auGhn0X9u_PSrNDeks5qqrzdgaJpZM4J86Tc .

eric-mann commented 8 years ago

you can map the interface back to the PCI B:D:F via sysfs ... i think there's an ethtool IOCTL for it as well .. like @get_drvinfo: Report driver/device information. Should only set the @driver, @version, @fw_version and @bus_info fields.

anyone want to submit a pull request ?

wmoors commented 8 years ago

you're right. the SIOCETHTOOL ioctl can be used to find the PCI bus address. I found this piece of demo code. (can't test it right now as I don't have access to the systems atm)

#include <linux/ethtool.h> 
#include <linux/sockios.h>
.. and other socket headers.
...

 int sock = socket(PF_INET, SOCK_DGRAM, 0);

 struct ifreq ifr;
 struct ethtool_cmd cmd;
 struct ethtool_drvinfo drvinfo;

 memset(&ifr, 0, sizeof ifr);
 memset(&cmd, 0, sizeof cmd);
 memset(&drvinfo, 0, sizeof drvinfo);
 strcpy(ifr.ifr_name, "eno1");

 ifr.ifr_data = &drvinfo;
 drvinfo.cmd = ETHTOOL_GDRVINFO;

 if(ioctl(sock, SIOCETHTOOL, &ifr) < 0) {
    perror("ioctl")
    return;
 }

The PCI bus address is available as a string in drvinfo.bus_info - given that the NIC is actually a PCI device.

eric-mann commented 8 years ago

i think for 'pci_connect' its okay ... I'm not sure this architecture really scales to other IO (USB, SDIO, etc). That's the subject of the '2.0 branch' or whatever we call it after some kernel patches ...

realize this whole 'user-space rings' design was motivated by some customers who wanted really REALLY low latency - so low, they didn't even want the overhead of a system call and going thru the standard network stack. So real-time that they couldn't afford to call into a socket because it drops your process priority.

This obviously comes with some compromises, although its still a trend in high performance packet processing stacks like DPDK and PacketDirect to use virtualization features and map SR-IOV VFs into user-mode applications.

pinealservo commented 8 years ago

As the original issue raised here was resolved, I'm going to close this. The PCI ambiguity is a known pain point--we definitely would welcome a pull request to fix it! There's already an issue open about it if it needs further discussion: #395