Open StephanStS opened 8 months ago
Can you check:
journalctl -u ifup@eth0
Good to know that the MAC addresses of the two LAN ports are randomised. But at least the interfaces do not seem to be switched (eth0
remains the LAN1 port).
In case of a failed power up, the output is:
Mär 09 12:11:59 r5s systemd[1]: Starting ifup@eth0.service - ifup for eth0...
Mär 09 12:11:59 r5s dhclient[405]: Internet Systems Consortium DHCP Client 4.4.3-P1
Mär 09 12:11:59 r5s ifup[405]: Internet Systems Consortium DHCP Client 4.4.3-P1
Mär 09 12:11:59 r5s ifup[405]: Copyright 2004-2022 Internet Systems Consortium.
Mär 09 12:11:59 r5s ifup[405]: All rights reserved.
Mär 09 12:11:59 r5s ifup[405]: For info, please visit https://www.isc.org/software/dhcp/
Mär 09 12:11:59 r5s dhclient[405]: Copyright 2004-2022 Internet Systems Consortium.
Mär 09 12:11:59 r5s dhclient[405]: All rights reserved.
Mär 09 12:11:59 r5s dhclient[405]: For info, please visit https://www.isc.org/software/dhcp/
Mär 09 12:11:59 r5s dhclient[405]:
Mär 09 12:12:00 r5s dhclient[405]: Listening on LPF/eth0/8e:31:b4:9c:c5:31
Mär 09 12:12:00 r5s ifup[405]: Listening on LPF/eth0/8e:31:b4:9c:c5:31
Mär 09 12:12:00 r5s ifup[405]: Sending on LPF/eth0/8e:31:b4:9c:c5:31
Mär 09 12:12:00 r5s ifup[405]: Sending on Socket/fallback
Mär 09 12:12:00 r5s dhclient[405]: Sending on LPF/eth0/8e:31:b4:9c:c5:31
Mär 09 12:12:00 r5s dhclient[405]: Sending on Socket/fallback
Mär 09 12:12:01 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
Mär 09 12:12:01 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3
Mär 09 12:12:01 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:01 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:01 r5s ifup[405]: receive_packet failed on eth0: Network is down
Mär 09 12:12:01 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:01 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:01 r5s dhclient[405]: receive_packet failed on eth0: Network is down
Mär 09 12:12:04 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
Mär 09 12:12:04 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
Mär 09 12:12:04 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:04 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:04 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:04 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:10 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Mär 09 12:12:10 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Mär 09 12:12:10 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:10 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:10 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:10 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:25 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Mär 09 12:12:25 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15
Mär 09 12:12:25 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:25 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:25 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:25 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:40 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 16
Mär 09 12:12:40 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 16
Mär 09 12:12:40 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:40 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:40 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:40 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:56 r5s dhclient[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
Mär 09 12:12:56 r5s ifup[405]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
Mär 09 12:12:56 r5s ifup[405]: send_packet: Network is down
Mär 09 12:12:56 r5s ifup[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:12:56 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:56 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
Mär 09 12:13:01 r5s dhclient[405]: No DHCPOFFERS received.
Mär 09 12:13:01 r5s ifup[405]: No DHCPOFFERS received.
Mär 09 12:13:01 r5s ifup[405]: No working leases in persistent database - sleeping.
Mär 09 12:13:01 r5s dhclient[405]: No working leases in persistent database - sleeping.
Mär 09 12:13:01 r5s systemd[1]: Finished ifup@eth0.service - ifup for eth0.
It just does not get a DHCP lease. It does immediately work when you run systemctl restart ifup@eth0
, and when you apply a static IP?
Hmm
Mär 09 12:12:40 r5s dhclient[405]: send_packet: Network is down
Mär 09 12:12:40 r5s dhclient[405]: dhclient.c:2600: Failed to send 300 byte long packet over eth0 interface.
This actually looks like it was not able to send the DHCPDISCOVER
, as if it really has no carrier signal or so 🤔.
It seems I am too having similar issue of unreliable/hang of ifup service during startup boot. Most of the time, waited for more that 1 minute and had to power it down and up again to try to get pass that upon new boot. Happy to help with troubleshooting if you need anything.
I thought this could be related to #6663, #6565
I have the same problem and think I found a workaround for the moment.
I have 2 installations, one on a sdcard and one on the eMMC. The sdcard installation works every time I boot the device, the eMMC installation isn't able to bring up eth0 most of the time.
I checked both installations for multiple hours and found nothing, they seem to be identical. So my guess is, that the sdcard installation boots slower, which gives the network device enough time to come up.
My "dirty" workaround is the following:
[Service] Type=oneshot ExecStart= ExecStart=sleep 10 ExecStart=/sbin/ifup --allow=hotplug %I
Can you both check my last comment and test whether restarting the service and/or assigning a static IP helps?
Restarting the service (networking.service), when configured with dhcp, works most of the time (if I remember correctly). The static IP helps to at least boot into the system.
The main problem is, that the boot dialog hangs at ifup@eth
If you give me some test cases (dhcp, static, reboot/poweroff, etc.), I can run them on my device.
Restarting the service (networking.service), when configured with dhcp, works most of the time (if I remember correctly).
This should not have any effect: networking.service
restarts auto
interfaces only, while we use allow-hotplug
interfaces, controlled by ifup@.service
instances only.
The main problem is, that the boot dialog hangs at ifup@eth.service for over 15 minutes.
Hmm, it should have a default startup timeout of 90 seconds. Similarly, the DHCP client itself has a 60 seconds timeout, before it gives of for a while (several minutes at least), which can be seen in Stephan's logs. If it hangs for more than 90 seconds in your case, then there must be a more low-level hang overriding the DHCP and systemd timeouts.
If you give me some test cases (dhcp, static, reboot/poweroff, etc.), I can run them on my device.
As said, it would be great to test whether a static IP does not cause that issues, i.e. whether only the DHCP client has issues at this stage.
Also can you add your ifup@eth0
logs in case it hangs for so long, to compare with Stephan's:
journactl -u ifup@eth0
So, I reverted all changes and tried multiple tests with dhcp and static ip settings.
Before you read further: I think it has nothing to do with dhcp or static ip, instead with the device not initialize correctly or a wrong device state due to services not waiting.
test (dhcp)
test (static)
last test
Here a screenshot of the boot process and the log:
Edit: as soon as I added the timeout (sleep command) both versions (dhcp and static ip) worked without a problem
Now I see the problem: ifup@eth0.service
tries to start before ifupdown-pre.service
has finished.
This service waits for udev
events to settle, so that network interfaces are not tried to be brought up, before device detection has finished. This is important for auto
interfaces, which are tried to be brought up at a fixed stage of the boot sequence, asynchronously to device detection, without ifupdown-pre.service
. On stock Debian, ifupdown-pre.service
does nothing if no auto
interfaces are used, since the ifup@.service
instances for allow-hotplug
interfaces are added to the boot sequence by udev
itself, once the related network adapter has been detected. However, this has the downside, that network.target
and network-online.target
can be reached, before network adapters have been even detected.
We hence changed ifupdown-pre.service
to as well wait for udev events to settle, if only allow-hotplug
interfaces are used. This assures that those network targets are never reached before network adapter device detection has finished, and all allow-hotplug
interfaces for attached adapters have been brought up.
But, since on stock Debian, the two services are not related to each other, if is perfectly possible that ifup@.service
instances run before ifupdown-pre,service
, in case adapters are detected quickly compared to other devices. I am not 100% sure about the implications, but at least theoretically it is possible that the Ethernet adapter LEDs have not been detected yet. In this case, the adapter would be started to be brought up, then the our udev
rule to attach them to the correct adapters/interfaces could start, which brings up and down the interface, to make the LED behave correctly. Totally makes sense that this breaks ifup
and would explain everything perfectly.
What we hence need to assure is that udev settles, hence our LED udev rules have finished, before ifup@.service
instances run, which can be done easiest by letting ifupdown-pre.service
run/finish before network-pre.target
. The service names alone indicate already that this is basically the intended order, matching also the practical order if auto
interfaces are used.
Please try this:
G_CONFIG_INJECT '\[Unit\]' '[Unit]' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf
G_CONFIG_INJECT 'Before=' 'Before=network-pre.target' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf '\[Unit\]'
G_CONFIG_INJECT 'Wants=' 'Wants=network-pre.target' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf '\[Unit\]'
reboot
Thanks for the detailed information and quick fix.
Unfortunately, it failed after the 13th attempt. 8 boot attempts with static ip worked flawlessly.
With dhcp, it worked 4 times and the 5th failed with wrong LED state (1 and 2 are on), the service ifupdown-pre.service
failed and now it hangs at ifup@eth0.service
and networking.service
.
As a side note, after the service sys-subsystem-net-devices-eth0.service
I got an error for the CPU governors, which doesn't happen if the system boots up normally.
As soon as the system is accessible, I attach the journal log.
Would be good to know how ifupdown-pre.service
failed. Probably something is keeping udev
busy until the service times out. We would then need to investigate what is causing this. I mean, the fact that we need to enforce ordering in this case already indicates that udev
takes longer than expected, as this does not happen on any other SBC. So we'd need to check this in any case, and above "fix" is more fixing symptoms, with the limitation of ifupdown-pre.service
timeout or udevadm settle
failure.
Here is the journal.txt
Before I found this specific ticket/issue, I read some other NanoPi related ones which mentioned, that the SoC is not always reset properly. It is a wild guess but would match with the missing CPU governors.
Not related to the issue, but your network configuration seems to have issues as well:
Jun 02 21:57:02 DietPi systemd[1]: Starting ifup@eth0.service - ifup for eth0...
Jun 02 21:57:02 DietPi ifup[464]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option
Jun 02 21:57:02 DietPi systemd[1]: Starting networking.service - Raise network interfaces...
Jun 02 21:57:02 DietPi ifup[466]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option
However, the bigger problem is that indeed the network adapters seem to be ready very late:
Jun 02 21:56:03 DietPi (udev-worker)[373]: green:lan-1: Spawned process '/bin/ip l s up dev eth0' [410] is taking longer than 59s to complete
Jun 02 21:56:03 DietPi (udev-worker)[339]: green:lan-2: Spawned process '/bin/ip l s up dev eth1' [420] is taking longer than 59s to complete
Jun 02 21:56:03 DietPi systemd-udevd[318]: green:lan-1: Worker [373] processing SEQNUM=3360 is taking a long time
Jun 02 21:56:03 DietPi systemd-udevd[318]: green:wan: Worker [349] processing SEQNUM=3362 is taking a long time
Jun 02 21:56:03 DietPi systemd-udevd[318]: green:lan-2: Worker [339] processing SEQNUM=3361 is taking a long time
Jun 02 21:57:02 DietPi dash[344]: Timed out for waiting the udev queue being empty.
These are our LED udev rules, trying to bring up the interface to apply the LED trigger (else it would just be lid, regardless whether a cable is attached or not).
The adapters have been detected at that time:
Jun 02 21:55:00 DietPi kernel: r8169 0000:01:00.0 eth0: RTL8125B, 8e:72:25:1c:0e:58, XID 641, IRQ 79
Jun 02 21:55:00 DietPi kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
...
Jun 02 21:55:02 DietPi systemd[1]: Expecting device sys-subsystem-net-devices-eth0.device - /sys/subsystem/net/devices/eth0...
Jun 02 21:55:02 DietPi DietPi-PreBoot[392]: DietPi-CPU_set | CPU governors are not available on this system. This is probably a virtual machine. Aborting...
Jun 02 21:55:02 DietPi systemd[1]: Found device sys-subsystem-net-devices-eth0.device - RTL8125 2.5GbE Controller.
There also the missing CPU governors show up. So yeah, something is broken since latest Linux 6.6.32 update. This all worked perfectly fine some time ago, also with older Linux 6.6, IIRC. Can you try this:
apt install linux-{image,dtb}-current-rockchip64=24.2.1
Confirm the downgrade.
Not related to the issue, but your network configuration seems to have issues as well:
Jun 02 21:57:02 DietPi systemd[1]: Starting ifup@eth0.service - ifup for eth0... Jun 02 21:57:02 DietPi ifup[464]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option Jun 02 21:57:02 DietPi systemd[1]: Starting networking.service - Raise network interfaces... Jun 02 21:57:02 DietPi ifup[466]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option
Yeah, not a problem. I just saved my older config in a tar-file, so ignore the error message.
There also the missing CPU governors show up. So yeah, something is broken since latest Linux 6.6.32 update. This all worked perfectly fine some time ago, also with older Linux 6.6, IIRC. Can you try this:
apt install linux-{image,dtb}-current-rockchip64=24.2.1
Confirm the downgrade.
I will try it tomorrow, as soon as I get back from work. Thanks for the support.
I just saved my older config in a tar-file, so ignore the error message.
Our /etc/network/interfaces
loads every file in this dir, regardless which file ending. Probably we should change it to load *.conf
files only. Though, AFAIR stock Debian loads everything as well. However, yeah, I guess it does not cause issues either.
I'll also run some tests now on my R5S.
With SD card, it works fine here as well. I cannot test the eMMC currently, since it is still on an old 5.10.160 kernel, where I want to test the conversion to more recent Linux builds. I'll see whether I can finish this tomorrow evening.
Just to rule it out, does it change anything for you when you remove our udev rules?
mv /etc/udev/rules.d/dietpi-eth-leds.rules{,_bak}
However, since you have issues even with the CPU governor, I think there is a more fundamental problem with the latest kernel when booting from eMMC.
Another report, which seems to be the same issue at first view, booting with NVMe in this case: https://dietpi.com/forum/t/20268
So, today it was way more difficult to reproduce the error.
Disabling the LED udev rule didn't change anything, I guess (stopped after 10 reboots). After that, I reverted the change and needed at least 20 reboots to get to the state.
Downgrading the kernel had some or no effect. I stopped after 30 reboots.
The only thing I can confirm is, that the CPU governor has nothing to do with the kernel versions or the failed state. On both versions (24.2.1 / 24.5.1) I got the error mentioned above. And on both versions, even when the governor detection failed, the initialization worked.
Changing the systemd service dependencies was a big step forward. For now, I have no clue how to force that rare state or it to be more replicable.
So far so good then, so enforcing the systemd order helped against the case where our udev rules brought down an interface which is in the process of ifup@.service
. But boot does still take long?
If so, we could further investigate this way:
G_CONFIG_INJECT 'udev_log=' 'udev_log=debug' /etc/udev/udev.conf
reboot
This a mass of detailed udev logs, which device is detected when, which udev rules it triggered, what they did, in case whether they took very long etc etc:
journalctl
In case, makes sense to filter it a little, like
journalctl --grep eth0
to check logs for eth0 in particular.
Did we actually check for kernel errors already?
dmesg -l 0,1,2,3
I removed eth1 and eth2 from /etc/network/interfaces
to speed up the boot process.
Due to the fact, that only the WAN interface has a fixed mac address, I used it to stop the randomization of the other to nics.
But of course, without setting a static ip address, the service tried a dhcp request for both unconfigured nics.
If I remember correctly, there were no kernel errors during the boot process. But I try one more time to replicate the error with extended udev logs. Maybe it lead to something.
So, after 80 or more reboots over two days, I can't reproduce the issue.
I also reverted the udev debug log, switch kernels and played with power supply, because the last time it happened, I switched the power supply to monitor the consumption via FNIRSi FNB58 (after our changes).
Either it is a weird power glitch/SoC reset problem or a timing problem between the services.
Update: Now, when I switch to static ip address, it sometimes hangs @ ifupdown-pre.service
. I'll keep investigating.
Did the udev logs contain any hint why the Ethernet devices take so long to get ready, or some detach/attach loops?
I disabled the logs and switched to static ip afterwards, because I wanted to setup my system for now.
I will re-enable the debug log and gather enough information. My first thought is, that via dhcp, the services have enough time to finish.
Update: Finally, I got the error again. Let's hope, that the error log will be enough to find the cause. (~15min. until I can access the login prompt)
Hey, a little bit late with my response due to overtime.
I could not find any clue, but maybe you will. Here is my log journal.log.gz
Definitely try to remove our udev rule. Or did you try this already and still ran into the same issue?
So, today it was way more difficult to reproduce the error.
Disabling the LED udev rule didn't change anything, I guess (stopped after 10 reboots). After that, I reverted the change and needed at least 20 reboots to get to the state.
Does this mean, you were not able to trigger the issue within 10 reboots, when the udev rule was disabled, but you were able to trigger it after 20 reboots, with the udev rules in place? If so, it has not yet been ruled out, that those do cause the issue, for which, we would need to trigger the issue with the rules disabled:
mv /etc/udev/rules.d/dietpi-eth-leds.rules{,_bak}
Hence, I am still not sure whether its a symptom or reason, but all seems to process fine, until one of the LED rules triggers /bin/ip l s up dev eth1
. This never finishes and times out after 3 minutes. In parallel, the other two LEDs hang at applying the netdev
trigger, respectively (the other one) at applying the network interface to the trigger. So all 3 rules hang and time out at different stages. And in two cases, right after these timeouts were logged, udev rules were completely reloaded Udev rules need reloading
. Since it is taking too long, ifupdown-pre.service
times out, after which networking.service
and all 3 ifup@.service
instances start (the last change made them all wait for this service to finish), but this happens ~1 minute before the LED trigger workers are killed.
So our ordering, which seem to have helped you, did not help in this particular case, since the timeouts made ifup@.service
instances still start, before our udev rules finished or were killed, leading to potentially parallel ip link set up/down
calls for the same interface. All 4 network services then time out after another 5 minutes (increased default timeout due to TimeoutStartSec=5min
in the units). Terminating them (SIGTERM
) times out as well after another 90 seconds, the hard way SIGKILL
times out as well, after another 90 seconds. But while it says Processes still around after SIGKILL. Ignoring.
, it actually tries another 3 times to SIGKILL
them, before giving up, even a 4th time for networking.service
. I was wondering why this happens, because normally SIGKILL
is done after a few seconds already, and given up after another few seconds, normally, but I think it has to do with this, in those units:
ExecStart=/usr/sbin/ifup -a --read-environment
ExecStart=-/bin/sh -c 'if [ -f /run/network/restart-hotplug ]; then /usr/sbin/ifup -a --read-environment --allow=hotplug; fi'
ExecStop=/usr/sbin/ifdown -a --read-environment --exclude=lo
ExecStopPost=/usr/bin/touch /run/network/restart-hotplug
TimeoutStartSec=5min
Here the example of networking.service
, which does not have a single start command only, but two start commands, one stop and one post-stop command. The first ExecStart
takes 5 minutes to time out, due to TimeoutStartSec
. Then its SIGERM times out after 90 seconds, then its SIGKILL after another 90 seconds, after which it is given up. The 2nd ExecStart
and ExecStop
are both skipped, but ExecStopPost
runs, which also times out after 90 seconds, with SIGTERM and SIGKILL each adding another 90 seconds. So all together: 300s + 5*90s = 12.5 minutes. Verified again with logs:
Jun 04 23:23:28 DietPi systemd[1]: Starting networking.service - Raise network interfaces...
# ExecStart timeout causing SIGTERM
Jun 04 23:28:28 DietPi systemd[1]: networking.service: start operation timed out. Terminating.
# SIGTERM timeout causing SIGKILL
Jun 04 23:29:58 DietPi systemd[1]: networking.service: State 'stop-sigterm' timed out. Killing.
Jun 04 23:29:58 DietPi systemd[1]: networking.service: Killing process 481 (ifup) with signal SIGKILL.
# SIGKILL timeout
Jun 04 23:31:28 DietPi systemd[1]: networking.service: Processes still around after SIGKILL. Ignoring.
# ExecStopPost timeout causing SIGTERM
Jun 04 23:32:59 DietPi systemd[1]: networking.service: State 'stop-post' timed out. Terminating.
# SIGTERM timeout causing SIGKILL
Jun 04 23:34:29 DietPi systemd[1]: networking.service: State 'final-sigterm' timed out. Killing.
Jun 04 23:34:29 DietPi systemd[1]: networking.service: Killing process 481 (ifup) with signal SIGKILL.
# SIGKILL timeout
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Processes still around after final SIGKILL. Entering failed mode.
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Failed with result 'timeout'.
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Unit process 481 (ifup) remains running after unit stopped.
Jun 04 23:35:59 DietPi systemd[1]: Failed to start networking.service - Raise network interfaces.
23:23:28 til 23:35:59 are 12.5 minutes. Totally insane, of course. For the ifup@.service
instances, there is no ExecStopPost
, hence they are given up 90 seconds earlier.
But I could not find any hint in the logs, why /bin/ip l s up dev eth1
and every other ip
/ifup
call time out in the first place, and seem to hang up not only the whole network, but more, so that even /usr/bin/touch /run/network/restart-hotplug
times out, a command which only creates an empty file in a tmpfs, and has by itself not anything to do with network at all. This file is only a flag, but does not cause any action by itself.
In any case, we really need to find a better way to enable the Ethernet LEDs, so that they light up/blink on link/activity, but keep them off until there is really a cable connected and link up. The whole reason for this ip
calls in our rules is that, once you set the LED to light up on link, it immediately lights up, regardless whether the link is up and a cable connected or not. And I did not find any other way to turn it off, but keeping the link trigger active, aside of setting the interface up and down once. A nasty limitation of this driver.
An alternative would be to leave the LEDs untouched at boot, and instead configure them via /etc/network/if-up.d
script, enabling the trigger once the interface is really configured. It has the same issue, that the LED is then turned on, even if no cable is connected, but it is probably a rare case that admins configure an interface via /etc/network/interfaces|.d
, without actually having a cable connected. And at least, when a cable is detached after the trigger was applied, the LED reacts correctly and turns off.
Could you (both of you) test whether this has been solved? We did some changes around network services.
I'm still on v9.5.1, do you want me to update to give some tests?
Uploading both logs for passed and failed scenarios, hope it helps. Note: I'm bridging my lan1 and lan2 ports. On the failed scenario, I noticed both my lan1 and lan2 LED are lit green. After reboot, only lan1 is lit green and it is working fine.
@jysqice does this apply to mainline kernel (which we use) as well?
@yjteng
In the failed case, the ip l s up dev eth
commands from the udev rules timed out again. So removing those should indeed solve it (but leaves the LEDs unconfgured).
Armbian btw provides an overlay to apply the LED triggers via device tree: https://github.com/armbian/build/blob/main/patch/kernel/archive/rockchip64-6.6/overlay/rockchip-rk3568-nanopi-r5s-leds.dts
I'll test this, whether this works without having the LEDs lid at boot, hence without the ip
up/down hassle, and whether it is possible to apply the other attributes we set via udev rules as well, or with a combination of both.
It should not have an effect on the underlying issue, but I recognise you use ifupdown2
, which at least bypasses the adjustments we did to ifupdown
, and might behave a little different. Not that it is not newer, no successor or any such, but just a different implementation written in Python, naturally slower, larger overhead, Python runtime as dependency etc, and it is developed slower. Hence we keep using original C `ifupdown´ intentionally.
Creating a bug report/issue
Required Information
DietPi version |
Distro version |
bookworm
Kernel version |
Linux r5s 6.6.16-current-rockchip64 #1 SMP PREEMPT Fri Feb 23 08:25:28 UTC 2024 aarch64 GNU/Linux
SBC model |
NanoPi R5S/R5C (aarch64)
R5SPower supply used | USB Hub power supply
SD card used | Several disks used, Samsung, Sandisk, no name
Additional Information (if applicable)
Steps to reproduce
LAN1
)Expected behaviour
Actual behaviour
WAN
)Extra details
ip a
in case of running system:ip a
in case of problem state:ethtool eth0
in case of running system:ethtool eth0
in case of problem state:Contents of
/etc/network/interfaces
:Strange behaviour of MAC of eth0 and eth1: The values (given by
ip a
) change on reboot, see output above. MAC of eth2 stays onc2:3b:04:cb:97:8b