Closed wmoors closed 8 years ago
Could you give us the console output on the talker and listener after you run the commands? It's probably stuck on SRP negotiation. Try a single Ctrl-C on the talker side, which seems to cancel the wait on the SRP thread. I haven't looked deeply into this, though.
An alternative to that (which might work better) is to use the srp command line client to make a domain declaration from one of the endpoints, which ought to trigger both of them into action. If you've got an AVB-capable switch, that would do the trick as well, since it will make the default SRP Domain declaration as well.
It doesn't seem like it hangs. (But I have seen that behavior once) I'm thinking it might be my setup.
I tried 2 different topo's.
neighborPropDelayThresh
to 2500)first machine (simple_talker) :
$> sudo ./run_simple_talker.sh enp2s0
attaching to 0000:03:00.0
detected domain Class A PRIO=0 VID=0000...
VNE 0000 R=000000000000 VN/MT
unhandled from mrpd
advertising stream ...
awaiting a listener ...
VJO 0000 R=a0369fac85c0 QA/IN
unhandled from mrpd
VJO 0000 R=a0369fac85c0 VP/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
got a listener ...
ml_phoffset = 21001981, ls_phoffset = 1852620402
ml_freqffset = 0.999999, ls_freqoffset = 0.999970
VNE 0000 R=a0369fac85c0 QA/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VNE 0000 R=a0369fac85c0 QA/IN
unhandled from mrpd
VJO 0000 R=a0369fac85c0 QA/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 VP/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 VP/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
VJO 0000 R=a0369fac85c0 QA/IN
unhandled from mrpd
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a new/join indication
EVENT on STREAM ID=a0369fac90b50000 with state ready
got a leave indication
listener left
listener left ...
second machine (simple_listener) :
$> sudo ./simple_listener -i enp2s0 -f output.wav
Msg: D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN
T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900,B=0000000000000000,C=0 R=a0369fac90b5 VO/IN
Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN
Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN
detected domain Class A PRIO=0 VID=0000...
Joing VLAN V++:I=0000
Waiting for talker...
Msg: VNE 0000 R=000000000000 VN/MT
Msg: SNE L:D=2,S=a0369fac90b50000 R=000000000000 VN/MT
Created file called output.wav
Msg: VJO 0000 R=a0369fac90b5 QA/IN
Msg: VJO 0000 R=a0369fac90b5 VP/IN
Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN
Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 VP/IN
Msg: VJO 0000 R=a0369fac90b5 QA/IN
Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN
Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN
Msg: VJO 0000 R=a0369fac90b5 QA/IN
Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN
Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 VP/IN
Msg: VJO 0000 R=a0369fac90b5 VP/IN
Msg: SJO T:S=a0369fac90b50000,A=91e0f0000e80,V=0000,Z=84,I=1,P=0,L=3900 R=a0369fac90b5 VO/IN
Msg: SJO D:C=0,P=0,V=0000,N=0 R=a0369fac90b5 QA/IN
^CReceived signal 2:leaving...
Msg: SLE L:D=2,S=a0369fac90b50000 R=000000000000 LA/MT
If I were to bet a donut, I would guess you have the igb driver loaded in the background and the igb_avb driver loaded (?). The xmit routine won't work if the device underneath is reset either by interface up/downed by a dhcp client or internal igb watchdog event from underneath the AVB stack.
I did have the NIC's still configured for DHCP on both test machines. But even with all IPv4 settings set to disabled (which I figure means that the network manager does not control them any more) the result is the same.
lsmod shows only the igb_avb module loaded...?
$> lsmod | grep igb
igb_avb 196608 0
dca 16384 1 igb_avb
ptp 20480 1 igb_avb
check the system log with dmesg and see if there are any interesting messages from igb_avb ... like 'Tx Timeout' ...
Oh ... and the PTP daemon is running and synchronized of the same interface first, right? IIRC, the clock used to pace out the packets in simple_talker isn't started unless the PTP stack activates the underlying LAN hw PTP timestamping functionality ... happens most often to people who have multiple interfaces enabled ...
I checked dmesg output but there's no suck message. Everything seems just fine.
[23186.887013] igb_avb: module verification failed: signature and/or required key missing - tainting kernel
[23186.887651] Intel(R) Gigabit Ethernet Network Driver - version 5.3.2_AVB
[23186.887652] Copyright (c) 2007-2015 Intel Corporation.
[23186.972170] igb_avb 0000:02:00.0: added PHC on eth0
[23186.972173] igb_avb 0000:02:00.0: Intel(R) Gigabit Ethernet Network Connection
[23186.972174] igb_avb 0000:02:00.0: eth0: (PCIe:2.5GT/s:Width x1)
[23186.972176] igb_avb 0000:02:00.0 eth0: MAC: a0:36:9f:ac:90:b5
[23186.972395] igb_avb 0000:02:00.0: eth0: PBA No: G65055-010
[23186.973309] igb_avb 0000:02:00.0 enp2s0: renamed from eth0
[23186.979445] igb_avb 0000:02:00.0: LRO is disabled
[23186.979449] igb_avb 0000:02:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[23186.989616] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[23187.833289] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[23187.833656] igb_avb 0000:03:00.0: added PHC on eth0
[23187.833658] igb_avb 0000:03:00.0: Intel(R) Gigabit Ethernet Network Connection
[23187.833660] igb_avb 0000:03:00.0: eth0: (PCIe:2.5GT/s:Width x1)
[23187.833661] igb_avb 0000:03:00.0 eth0: MAC: a0:36:9f:ac:91:d2
[23187.833962] igb_avb 0000:03:00.0: eth0: PBA No: G65055-010
[23187.839629] igb_avb 0000:03:00.0: LRO is disabled
[23187.839633] igb_avb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[23187.844519] igb_avb 0000:03:00.0 enp3s0: renamed from eth0
[23187.868218] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[23188.717202] IPv6: ADDRCONF(NETDEV_UP): enp3s0: link is not ready
[23190.724550] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[23190.724910] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[23232.273790] bind to iface 0000:03:00.0
[23232.273794] checking against adapter name 0000:02:00.0
[23232.273796] checking against adapter name 0000:03:00.0
[23258.107052] bind to iface 0000:03:00.0
[23258.107055] checking against adapter name 0000:02:00.0
[23258.107057] checking against adapter name 0000:03:00.0
[23493.697691] bind to iface 0000:03:00.0
[23493.697699] checking against adapter name 0000:02:00.0
[23493.697702] checking against adapter name 0000:03:00.0
[84390.902361] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[84402.431079] igb_avb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[84407.318983] igb_avb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[84407.319297] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[84467.990914] bind to iface 0000:03:00.0
[84467.990920] checking against adapter name 0000:02:00.0
[84467.990922] checking against adapter name 0000:03:00.0
[84481.837326] bind to iface 0000:03:00.0
[84481.837329] checking against adapter name 0000:02:00.0
[84481.837331] checking against adapter name 0000:03:00.0
[84512.470146] igb_avb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Down
[84708.377834] device enp2s0 entered promiscuous mode
[84731.985564] device enp2s0 left promiscuous mode
[85834.485534] device enp2s0 entered promiscuous mode
[85846.679189] device enp2s0 left promiscuous mode
[85859.793486] device enp2s0 entered promiscuous mode
[85886.533777] device enp2s0 left promiscuous mode
[86612.409588] device enp2s0 entered promiscuous mode
[86716.947215] device enp2s0 left promiscuous mode
I'm not sure about the gPTP daemons output though. I get a lot of LINKUP status messages. Is that normal behaviour?
PC1:
sudo ./run_daemons.sh enp2s0
Starting daemons on enp2s0
rmmod: ERROR: Module igb is not currently loaded
insmod: ERROR: could not insert module kmod/igb/igb_avb.ko: File exists
groupadd: group 'ptp' already exists
INFO : GPTP [10:20:50:080] gPTP starting
priority1 = 248
announceReceiptTimeout: 3
syncReceiptTimeout: 3
phy_delay_gb_tx: 184
phy_delay_gb_rx: 382
phy_delay_mb_tx: 1044
phy_delay_mb_rx: 2133
neighborPropDelayThresh: 2500
syncReceiptThreshold: 8
Using clock device: /dev/ptp0
STATUS : GPTP [10:20:50:081] Starting PDelay
SENT MAAP_PROBE
STATUS : GPTP [10:20:50:115] AsCapable: Enabled
process_events()
SENT MAAP_PROBE
SENT MAAP_PROBE
STATUS : GPTP [10:20:51:669] Switching to Slave
STATUS : GPTP [10:20:51:669] New Grandmaster "A0:36:9F:FF:FE:AC:85:C0" (previous "00:00:00:00:00:00:00:00")
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_ANNOUNCE
ANNOUNCED ADDRESS:
0x91 0xe0 0xf0 0 0xf0 0xa7
STATE change to DEFEND:
STATUS : GPTP [10:21:16:600] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:21:18:815] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:21:18:822] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:21:29:909] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:21:45:532] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:22:03:853] LINKUP
Using clock device: /dev/ptp0
STATUS : GPTP [10:22:03:866] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
STATUS : GPTP [10:22:48:855] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
STATUS : GPTP [10:24:09:900] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
STATUS : GPTP [10:24:47:559] LINKUP
Using clock device: /dev/ptp0
SENT MAAP_ANNOUNCE
STATUS : GPTP [10:25:15:342] LINKUP
Using clock device: /dev/ptp0
sudo ./stop_daemons.sh
Exiting on 15
PC2:
INFO : GPTP [10:20:48:619] gPTP starting
priority1 = 248
announceReceiptTimeout: 3
syncReceiptTimeout: 3
phy_delay_gb_tx: 184
phy_delay_gb_rx: 382
phy_delay_mb_tx: 1044
phy_delay_mb_rx: 2133
neighborPropDelayThresh: 2500
syncReceiptThreshold: 8
Using clock device: /dev/ptp0
STATUS : GPTP [10:20:48:619] Starting PDelay
SENT MAAP_PROBE
process_events()
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
STATUS : GPTP [10:20:50:662] AsCapable: Enabled
SENT MAAP_PROBE
SENT MAAP_PROBE
STATUS : GPTP [10:20:51:619] *** Announce Timeout Expired - Becoming Master
STATUS : GPTP [10:20:51:619] New Grandmaster "A0:36:9F:FF:FE:AC:85:C0" (previous "00:00:00:00:00:00:00:00")
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_PROBE
SENT MAAP_ANNOUNCE
ANNOUNCED ADDRESS:
0x91 0xe0 0xf0 0 0xd2 0xee
STATE change to DEFEND:
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
SENT MAAP_ANNOUNCE
sudo ./stop_daemons.sh
Exiting on 15
ok... I found the problem. Everything works now.
It was the PCI address msg in the simple_talker log that was a big giveaway. What I failed to mention was that I have 2 I210 cards in each system. And even though I gave the correct interface name when starting the simple_talker, it used the wrong PCI address.
I haven't fixed it yet though, I just tested it with the extra card removed from the system running the talker.
I've seen at least 3 people get tripped up by this same issue when using the simple_talker and simple_listener examples for the first time, myself included. pci_connect needs to be reworked to connect to the appropriate PCI device, given an interface name so that using multiple I210 cards won't be so hard to get started.
I don't know how to correlate the interface name to the PCI bus number, so the solution we used was a dumb lookup table in pci_connect which is not a portable solution.
On Fri, Sep 16, 2016, 9:59 AM wmoors notifications@github.com wrote:
ok... I found the problem. Everything works now.
It was the PCI address msg in the simple_talker log that was a big giveaway. What I failed to mention was that I have 2 I210 cards in each system. And even though I gave the correct interface name when starting the simple_talker, it used the wrong PCI address.
I haven't fixed it yet though, I just tested it with the extra card removed from the system running the talker.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AVnu/Open-AVB/issues/479#issuecomment-247638648, or mute the thread https://github.com/notifications/unsubscribe-auth/ACaeYWl_P3UaI-37auGhn0X9u_PSrNDeks5qqrzdgaJpZM4J86Tc .
you can map the interface back to the PCI B:D:F via sysfs ... i think there's an ethtool IOCTL for it as well .. like @get_drvinfo: Report driver/device information. Should only set the @driver, @version, @fw_version and @bus_info fields.
anyone want to submit a pull request ?
you're right. the SIOCETHTOOL ioctl can be used to find the PCI bus address. I found this piece of demo code. (can't test it right now as I don't have access to the systems atm)
#include <linux/ethtool.h>
#include <linux/sockios.h>
.. and other socket headers.
...
int sock = socket(PF_INET, SOCK_DGRAM, 0);
struct ifreq ifr;
struct ethtool_cmd cmd;
struct ethtool_drvinfo drvinfo;
memset(&ifr, 0, sizeof ifr);
memset(&cmd, 0, sizeof cmd);
memset(&drvinfo, 0, sizeof drvinfo);
strcpy(ifr.ifr_name, "eno1");
ifr.ifr_data = &drvinfo;
drvinfo.cmd = ETHTOOL_GDRVINFO;
if(ioctl(sock, SIOCETHTOOL, &ifr) < 0) {
perror("ioctl")
return;
}
The PCI bus address is available as a string in drvinfo.bus_info
- given that the NIC is actually a PCI device.
i think for 'pci_connect' its okay ... I'm not sure this architecture really scales to other IO (USB, SDIO, etc). That's the subject of the '2.0 branch' or whatever we call it after some kernel patches ...
realize this whole 'user-space rings' design was motivated by some customers who wanted really REALLY low latency - so low, they didn't even want the overhead of a system call and going thru the standard network stack. So real-time that they couldn't afford to call into a socket because it drops your process priority.
This obviously comes with some compromises, although its still a trend in high performance packet processing stacks like DPDK and PacketDirect to use virtualization features and map SR-IOV VFs into user-mode applications.
As the original issue raised here was resolved, I'm going to close this. The PCI ambiguity is a known pain point--we definitely would welcome a pull request to fix it! There's already an issue open about it if it needs further discussion: #395
I started to study AVB through the openAVB stack and try to run the simple_talker and simple_listener on 2 separate machines both equipped with an Intel I210 but I can't get it working.
The stack on both machines is built with travis.sh.
On the first machine:
On the second machine:
However, the output.wav always stays at 44bytes. So I started sniffing, but could not see any 1722 frames whatsoever.
I attached a debugger to the simple_talker and put a breakpoint at the
igb_xmit
line (l756). This call returns no error but I still can't see any 1722 frame coming out.Does this ring a bell with anyone?
additional info :