gilestrolab / ethoscope

a platform from monitoring animal behaviour in real time from a raspberry pi
http://lab.gilest.ro/ethoscope/
GNU General Public License v3.0
17 stars 25 forks source link

Ethoscope unstable wifi connection. `dhcpcd` getting killed #155

Open antortjim opened 3 years ago

antortjim commented 3 years ago

Dear ethoscope community.

I have encountered today an issue that I already encountered in the past, but now I have been able to gather more information about it. I know it is not an ethoscope specific problem, but more of a RPi thing, but still very useful for the community to be aware of.

The problem is a particular ethoscope keeps connecting and disconnecting from the network. It keeps reconnecting to the network during an experiment. But I think it doesn't have the issue during a following experiment (I still need to confirm this).

I have checked the logs of netctl-auto by calling journalctl -ru netctl-auto@wlan0 (see logs at the end)

Basically the same block of logs from DUID ... to dhcpcd exited keep repeating every few minutes. On the block it is very clear that a particular program is getting sent the SIGTERM signal, which kills the program. These kills match the times when the ethoscope goes offline. I checked the PID and the process being killed is

dhcpcd -4 -q -t 60 -K -L wlan0

which indeed makes sense. That is the program in charge of setting the WiFi connection.

My question is WHAT is causing this process to be killed or at least WHY does it happen. I have tried disabling the power saving mode with

sudo iwconfig wlan0 power off

but it keeps happening

I have also checked the logs of systemd-networkd , which contain the Gained carrier and Lost carrier messages at time matching the logs of netctl-auto. So it is definitely connected.

I don't know if this is caused by a defective RPi or maybe an overloaded network. But I can say when it happens, it keep happening on the same ethoscope. I would expect this problem to happen to a different ethoscope every time if the network was overloaded. I wonder whether an ethernet connection (via cable) would patch this problem....

systemd-networkd

Apr 21 18:58:57 ETHOSCOPE013 systemd-networkd[204]: wlan0: Gained carrier
Apr 21 18:56:49 ETHOSCOPE013 systemd-networkd[204]: wlan0: Lost carrier

netctl-auto@wlan0

Apr 21 18:59:02 ETHOSCOPE013 dhcpcd[22663]: wlan0: adding default route via 192.169.123.254
Apr 21 18:59:02 ETHOSCOPE013 dhcpcd[22663]: wlan0: adding route to 192.169.123.0/24
Apr 21 18:59:02 ETHOSCOPE013 dhcpcd[22663]: wlan0: using static address 192.169.123.63/24
Apr 21 18:58:57 ETHOSCOPE013 dhcpcd[22663]: wlan0: probing address 192.169.123.63/24
Apr 21 18:58:57 ETHOSCOPE013 dhcpcd[22663]: wlan0: IAID eb:88:18:b3
Apr 21 18:58:57 ETHOSCOPE013 dhcpcd[22663]: DUID 00:01:00:01:24:b8:fb:f7:b8:27:eb:94:18:21
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22570]: dhcpcd exited
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22570]: wlan0: removing interface
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22570]: received SIGTERM, stopping
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22582]: waiting for pid 22570 to exit
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22582]: sending signal TERM to pid 22570
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22553]: forked to background, child pid 22570
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22553]: wlan0: adding default route via 192.169.123.254
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22553]: wlan0: adding route to 192.169.123.0/24
Apr 21 18:56:51 ETHOSCOPE013 dhcpcd[22553]: wlan0: using static address 192.169.123.63/24
Apr 21 18:56:46 ETHOSCOPE013 dhcpcd[22553]: wlan0: probing address 192.169.123.63/24
Apr 21 18:56:46 ETHOSCOPE013 dhcpcd[22553]: wlan0: IAID eb:88:18:b3
Apr 21 18:56:46 ETHOSCOPE013 dhcpcd[22553]: DUID 00:01:00:01:24:b8:fb:f7:b8:27:eb:94:18:21
Apr 21 18:56:10 ETHOSCOPE013 dhcpcd[22268]: dhcpcd exited
Apr 21 18:56:10 ETHOSCOPE013 dhcpcd[22268]: wlan0: removing interface
Apr 21 18:56:10 ETHOSCOPE013 dhcpcd[22268]: received SIGTERM, stopping
Apr 21 18:56:10 ETHOSCOPE013 dhcpcd[22522]: waiting for pid 22268 to exit
Apr 21 18:56:10 ETHOSCOPE013 dhcpcd[22522]: sending signal TERM to pid 22268
Apr 21 18:52:49 ETHOSCOPE013 dhcpcd[22252]: forked to background, child pid 22268
Apr 21 18:52:49 ETHOSCOPE013 dhcpcd[22252]: wlan0: adding default route via 192.169.123.254
Apr 21 18:52:49 ETHOSCOPE013 dhcpcd[22252]: wlan0: adding route to 192.169.123.0/24
Apr 21 18:52:49 ETHOSCOPE013 dhcpcd[22252]: wlan0: using static address 192.169.123.63/24
Apr 21 18:52:44 ETHOSCOPE013 dhcpcd[22252]: wlan0: probing address 192.169.123.63/24
Apr 21 18:52:44 ETHOSCOPE013 dhcpcd[22252]: wlan0: IAID eb:88:18:b3
Apr 21 18:52:44 ETHOSCOPE013 dhcpcd[22252]: DUID 00:01:00:01:24:b8:fb:f7:b8:27:eb:94:18:21
Apr 21 18:50:00 ETHOSCOPE013 dhcpcd[21868]: dhcpcd exited
Apr 21 18:50:00 ETHOSCOPE013 dhcpcd[21868]: wlan0: removing interface
Apr 21 18:50:00 ETHOSCOPE013 dhcpcd[21868]: received SIGTERM, stopping
Apr 21 18:50:00 ETHOSCOPE013 dhcpcd[22152]: waiting for pid 21868 to exit
Apr 21 18:50:00 ETHOSCOPE013 dhcpcd[22152]: sending signal TERM to pid 21868

Thank you, Antonio

ggilestro commented 3 years ago

We do see wifi issues from time to time. For some mysterious reason, they tend to be concentrated in time or space. For the past several months I believe things have been more stable. I never managed to crack them down but one thing that has been in my list of things to do is to switch the platform away from DHCP and use static IPs instead. This would have the advantage to quickly map the ethoscope to the IP address.

antortjim commented 3 years ago

I will give that a try asap, and get back here when I know something new. Thank you @ggilestro