MONROE-PROJECT / Maintenance

MONROE Maintenance procedures, and mostly an issue tracker.
0 stars 0 forks source link

[Node 85] Tracking Issue #93

Closed jonakarl closed 6 years ago

jonakarl commented 7 years ago

Modems restarted due to PID 1408

alfs commented 7 years ago

One modem (H3G) is still PID 1408. I manually restarted modem, still PID 1408. Powered off modem, restarted whole node. Still PID 1408.

kristrev commented 7 years ago

Most likely you are not getting an IP using DHCP. Can you check what kind of traffic is flowing through to the device (tcpdump)? We are particularly interested in ARP.

alfs commented 7 years ago

This is a capture from node 98 (not 85), but I think modems with PID 1408 all have the same problem (modem does not send DHCP offers):

root@Monroe000db94007d4:/home/monroeSA# tcpdump -eni usb2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on usb2, link-type EN10MB (Ethernet), capture size 262144 bytes
13:01:17.647594 ca:7b:0b:21:1d:9d > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::c87b:bff:fe21:1d9d > ff02::2: ICMP6, router solicitation, length 16
13:01:17.780774 a8:a6:68:da:06:df > 33:33:00:00:00:02, ethertype IPv6 (0x86dd), length 70: fe80::aaa6:68ff:feda:6df > ff02::2: ICMP6, router solicitation, length 16
13:01:20.991938 ca:7b:0b:21:1d:9d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 332: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from ca:7b:0b:21:1d:9d, length 290
13:01:33.004073 ca:7b:0b:21:1d:9d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 332: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from ca:7b:0b:21:1d:9d, length 290
13:01:37.007970 ca:7b:0b:21:1d:9d > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 332: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from ca:7b:0b:21:1d:9d, length 290
13:01:39.160778 a8:a6:68:da:06:df > 01:00:5e:00:00:01, ethertype IPv4 (0x0800), length 46: 0.0.0.0 > 224.0.0.1: igmp query v2

ifconfig:

root@Monroe000db94007d4:/home/monroeSA# ifconfig usb2
usb2      Link encap:Ethernet  HWaddr ca:7b:0b:21:1d:9d  
          inet6 addr: fe80::c87b:bff:fe21:1d9d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:648 (648.0 B)  TX bytes:3812 (3.7 KiB)

From /var/log/multi.log for the same time as the packet trace:

[12:01:12 09/12/2016]: Hostname: Monroe000db94007d4
[12:01:12 09/12/2016]: Sending DHCP DISCOVER (iface idx 1925).
[12:01:12 09/12/2016]: Next timeout will expire in 8 sec on interface usb2 (iface idx 1925)
[12:01:17 09/12/2016]: Interface usb2 (1925) is RUNNING, length 4
[12:01:17 09/12/2016]: Interface usb2 (idx 1925) has already been seen. Ignoring event
[12:01:20 09/12/2016]: Hostname: Monroe000db94007d4
[12:01:20 09/12/2016]: Sending DHCP DISCOVER (iface idx 1925).
[12:01:20 09/12/2016]: Next timeout will expire in 12 sec on interface usb2 (iface idx 1925)
[12:01:33 09/12/2016]: Could not get a reply after 3timeouts, falling back to INIT/REBOOTING for interfaceusb2
[12:01:33 09/12/2016]: Hostname: Monroe000db94007d4
[12:01:33 09/12/2016]: Sending DHCP DISCOVER (iface idx 1925).
[12:01:33 09/12/2016]: Next timeout will expire in 4 sec on interface usb2 (iface idx 1925)
[12:01:37 09/12/2016]: Hostname: Monroe000db94007d4
[12:01:37 09/12/2016]: Sending DHCP DISCOVER (iface idx 1925).
[12:01:37 09/12/2016]: Next timeout will expire in 8 sec on interface usb2 (iface idx 1925)
[12:01:45 09/12/2016]: Hostname: Monroe000db94007d4
[12:01:45 09/12/2016]: Sending DHCP DISCOVER (iface idx 1925).
alfs commented 7 years ago

No usb hub crashes for 30 minutes, but a lot of usb error -32's. Only one modem visible in lsusb.

monroe@Monroe000db94002d8:~$ uptime
 00:07:17 up 27 min,  1 user,  load average: 0.14, 0.41, 0.35
...
monroe@Monroe000db94002d8:~$ 
[   74.881284] usb 1-5.2: device descriptor read/64, error -32
[   75.133225] usb 1-5.2: device descriptor read/64, error -32
[   75.313226] usb 1-5.2: device descriptor read/64, error -32
[   75.901181] usb 1-5.2: device not accepting address 7, error -32
[   76.393138] usb 1-5.2: device not accepting address 8, error -32
[   76.685144] usb 1-5.3: device descriptor read/64, error -32
[   76.865099] usb 1-5.3: device descriptor read/64, error -32
[   77.121090] usb 1-5.3: device descriptor read/64, error -32
...
monroe@Monroe000db94002d8:~$ lsusb 
Bus 003 Device 002: ID 058f:6366 Alcor Micro Corp. Multi Flash Reader
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 1199:68c0 Sierra Wireless, Inc. 
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 004: ID 04d8:f2f7 Microchip Technology, Inc. 
Bus 001 Device 003: ID 19d2:1403 ZTE WCDMA Technologies MSM 
Bus 001 Device 002: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
monroe@Monroe000db94002d8:~$ 
alfs commented 7 years ago

One modem (Telia) is unplugged and node is restarted to see if this helps with USB problems.

alfs commented 7 years ago

Didn't help. USB hub not visible in lsusb, lot of usb errors in the kernel log. Unplugged H3G and rebooted node.

monroe@Monroe000db94002d8:~$ uptime
 16:48:15 up 54 min,  1 user,  load average: 1.36, 1.53, 1.53
monroe@Monroe000db94002d8:~$ lsusb 
Bus 003 Device 002: ID 058f:6366 Alcor Micro Corp. Multi Flash Reader
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 002: ID 1199:68c0 Sierra Wireless, Inc. 
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
monroe@Monroe000db94002d8:~$ dmesg | head
[ 2940.117971] usb 4-5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 2940.120150] hub 4-5:1.0: USB hub found
[ 2940.121875] hub 4-5:1.0: 4 ports detected
[ 2940.218880] usb 4-5: USB disconnect, device number 115
[ 2940.350743] usb 1-5: new high-speed USB device number 33 using ehci-pci
[ 2940.886727] usb 1-5: device not accepting address 33, error -71
[ 2941.274719] usb 4-5: new full-speed USB device number 116 using ohci-pci
[ 2941.433923] usb 4-5: not running at top speed; connect to a high speed hub
[ 2941.437854] usb 4-5: New USB device found, idVendor=0424, idProduct=2514
[ 2941.437871] usb 4-5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
monroe@Monroe000db94002d8:~$ 
jonakarl commented 6 years ago

Not relevant after retiring the mifis and big "node rebuild".