Closed PeterLacknase closed 5 years ago
haveged is required btw read th4e documents for php-fpm 7+
It‘s no dependency, should be then.
@PeterLacknase @diveyez Hmm yeah a similar issue was reported already, although not during service starts but on earlier boot stage: https://github.com/MichaIng/DietPi/issues/2760
Not sure why this issues arrive now. Seems to be some recent patch of Debian that created additional issues with this and v4.19 will ship a solution (expected soon for RPi).
However this is just relevant for headless systems with low user input. php-fpm itself does not use haveged directly (thus no dependency) but it seems to require a relatively large amount of entropy for it's startup. And on headless systems the entropy pool seems to be too low or does not fill fast enough. So haveged assures it is always filled.
I am glad we found this as a reason for the service restarts as well, at least we know a solution now.
Since there are not yet many reports, I think we will not implement this by default into DietPi. But e.g. we can add an info via MOTD (in DietPi login banner) to try G_AGI haveged
when facing boot hangs in combination with random: ...
messages.
@PeterLacknase
Btw does this indeed also solve the long lasting systemctl restart php7.3-fpm
?
Yes. It solves that problem as well. You are right: Apparently headless systems are more prone to that lack of entropy. On all my SBCs hdmi, peripherals, gettys etc. are completely disabled. (Apart from one Ubiquity-machine with serial console.) So a hint for those kind of use cases would probably suffice.
Did some search:
One guy stated indeed to make haveged
an essential packages (being installed on all Debian systems by default and cannot be removed without forcing it interactively) and should be even loaded into the initramfs (where used).
I didn't read every follow up completely, but indeed the issue seems to be present on Raspberry Pis especially, since that board ships not much hardware sources for entropy: https://lists.debian.org/debian-devel/2018/12/msg00293.html
I am wondering why I never faced this on my RPi2 🤔.
@PeterLacknase @diveyez
If you are in mood, could you send a dietpi-bugreport
and paste the ID here? So I can inspect the setups (installed software, startup services and such) to see if there are more similarities than just the board. Some more users ran into the same issue (in other GitHub issues), so I hope I can get at least a small pool (probably growing) of systems to compare.
On the development board, discussion stopped at end of 2018 without result... RPi kernel 4.19 seems to not have solved it, at I have another report where the kernel was already upgraded.
So far it seems that especially PHP7.3-FPM is affected, so probably, as long as Debian does not come up with a long term solution, we need to install haveged
by default together with PHP, although I am not happy about an additional process and ~20M disk space 😒.
arm7 (rpi 3b+): 2d9a4ee6-fe58-495e-bb38-d5fd750fd204
arm6 (rpi zero w): 6dd50815-7b78-423b-90e6-a3e7e24a4854
@PeterLacknase
Many thanks. haveged
was required on both systems to solve service start hanging?
Yes. (PHP7.3-fpm, that is.)
I will do the rpi zero w again this week to get you what you need. =)
Im getting undervolts and post hangs. I cannot even get to terminal to pull logs. Something is definitely wrong with preboot tasks, and its not just dietpi, its debian itself I feel. I am gonna need to reformat and go at it again. Will update tomorrow.
dietpi armv7 rpi3b+ was actually using php-fpm7.0 in my case, it appears debian took a long vacation with apt and confused her. The fix is going to be avoiding using php until that arm repo is patched (months)
Two things, the packages were likely compiled with haveged and apparmor listed as dependencies by accident, and pool.ntp.org is a liar. I noticed my clock was sent back into the past.
Is there really a way to fix this without building php ourselves?
Images URL: https://github.com/diveyez/build.deb/tree/master/debug I really dont want to keep bothering you Michal but, you should have this photo, I also do not see a need to put this on another issue, I had the same issue as Peter did on this build/board, however after installing haveged and the dependencies with it, I am stuck here unable to escape to anywhere else but tty3 or tty4
Just for comparison: on my systems serial gettys are completely disabled. I‘m using screen sessions via ssh.
(Should work with serials nontheless.)
@diveyez Okay, so:
Please paste:
[ TIME ]
and [ **] A start ...
lines, when you face them and:
journalctl -u networking
ip a
And I see something about PHP7.0 there. Your system should have been migrated to PHP7.3, Did you install PHP outside of DietPi-Software?
And as a last suggestion:
apt purge plymouth; apt autoremove --purge
Crypto was removed from PhP in .3 The processes I run see this as an insecure boot, and hang, everything up. A friend found the issue this morning and pointed it out. I didnt even notice that they did that. But it makes sense as to why my system rolled back when 7.3 was dropped in.
Will post lines when home. Thanks!
@diveyez
I am still wondering what you mean with "Crypto", the following cryptography modules are available for PHP: https://www.php.net/manual/en/refs.crypto.php
mcrypt
was dropped (marked as deprecated) with PHP7.1 already.
mcrypt is a dependency for my sanity check processes. This is the cause. I am left with the option to rebuild to fix this. No worries.
@diveyez Ah jep that's true. If possible it is recommended to migrate to use the openssl module.
In the works, but first I have to rebuild unfortunately because my secops code literally kills ssh if php is vulnerable and I cannot migrate cleanly anymore after all the keybashing.
Related: https://github.com/MichaIng/DietPi/issues/2871 And: https://github.com/MichaIng/DietPi/issues/2870#issuecomment-497154708
An alternative to haveged
: rng-tools
@MichaIng
Experienced the same problems (extremely slow start-up, followed by a very slow start of all services) on my RPi Zero W after upgrade from v6.22, have therefore also installed haveged and since all seems to have returned to normal.
Since I read in another (closed) issue that you want to collect info on this issue I also sent a bugreport, hope it helps. Here's my dietpi-bugreport ID:
root@DietPi:~# dietpi-bugreport [ OK ] DietPi-Bugreport | Root access verified. [ OK ] DietPi-Bugreport | RootFS R/W access verified. [ INFO ] DietPi-Bugreport | Packing upload archive, please wait... [ OK ] DietPi-Bugreport | Connection test: ssh.dietpi.com [ OK ] DietPi-Bugreport | Bug report sent, reference code: e817deae-ad83-4663-85d4-32e559223eef
Same applies also to my NanoPi NEO2 SBC, let me know if you would like a bugreport from this device also.
@maartenlangeveld Many thanks.
Same applies also to my NanoPi NEO2 SBC, let me know if you would like a bugreport from this device also.
Jep that would be great, if you don't mind.
I don't mind :-) Hereby (NanoPi NEO2) bugreport ID
root@DietPi-DNS:~# dietpi-bugreport [ OK ] DietPi-Bugreport | Root access verified. [ OK ] DietPi-Bugreport | RootFS R/W access verified. [ INFO ] DietPi-Bugreport | Packing upload archive, please wait... [ OK ] DietPi-Bugreport | Connection test: ssh.dietpi.com [ OK ] DietPi-Bugreport | Bug report sent, reference code: 1651785e-3477-409e-9388-3a79306c80a6
I am getting a massive issue relating to NTP desync due to Softclock emulating Hardwareclock with the Rpi3bv and DietPI. Is there a way around this?, because I cannot even post boot at this point.
@diveyez
NTP desync
NB: Sorry fore being most probably a bid more verbose below than required for you, as I think you already know most of this.
You mean NTP or RTC? Because these are two different things:
Since RPi does not have a real RTC, it has the fake-hwclock
package pre-installed. It just saves the time on shutdown to a file and loads this timestamp as system time on boot. So early boot messages will still have wrong timestamps (no change to get around this!), the ones from last shutdown, whenever this was, but at least they are not dated to 01/01/1970 😉.
After network is up, the system time will be updated of course, but no change to control the timestamps before that.
So could you specify which of the above did not work as expected?
Further reading about the topic: https://unix.stackexchange.com/questions/442698/when-i-log-in-it-hangs-until-crng-init-done#442744
rng-tools
generally seems to require certain hardware to gather random numbers from. Many peripherals and things should work, but since we cannot assure it, haveged
seems to be the only reliable option.On VM the kernel message, where some systems hang on boot, shows up as well, 10-15 minutes after boot:
root@VM-Stretch:~# dmesg | grep random
[ 0.997096] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 0.997271] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 0.997297] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 2.428209] random: fast init done
[ 754.821644] random: crng init done
[ 754.821669] random: 7 urandom warning(s) missed due to ratelimiting
But it does not cause boot hang nor php7.3-fpm start/restart hang on boot or when called manually. Same on RPi but as well, but muuch faster and no real hang or service start/restart issue: 2019-06-03 23:38:01 root@micha:/tmp# dmesg | grep random
2019-06-03 23:38:01 root@micha:/tmp# dmesg | grep random
[ 0.000000] random: get_random_bytes called from start_kernel+0xac/0x4b4 with crng_init=0
[ 1.647480] random: fast init done
[ 3.897806] random: systemd: uninitialized urandom read (16 bytes read)
[ 4.082484] random: systemd: uninitialized urandom read (16 bytes read)
[ 4.085968] random: systemd: uninitialized urandom read (16 bytes read)
[ 16.048898] random: crng init done
[ 16.048935] random: 7 urandom warning(s) missed due to ratelimiting
haveged
:On VM:
root@VM-Stretch:~# dmesg | grep random
[ 1.122937] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.123103] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.123129] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 2.149193] random: fast init done
[ 6.780873] random: crng init done
[ 6.780877] random: 7 urandom warning(s) missed due to ratelimiting
On RPi:
root@VM-Jessie:~# dmesg | grep random
[ 70.189616] random: systemd-udevd: uninitialized urandom read (16 bytes read, 2 bits of entropy available)
[ 71.338422] random: systemd: uninitialized urandom read (16 bytes read, 30 bits of entropy available)
[ 71.547181] random: systemd-sysv-ge: uninitialized urandom read (16 bytes read, 39 bits of entropy available)
[ 72.085517] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.086023] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.086609] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.107337] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.109459] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.109521] random: systemd: uninitialized urandom read (16 bytes read, 68 bits of entropy available)
[ 72.141338] random: systemd: uninitialized urandom read (16 bytes read, 69 bits of entropy available)
[ 73.472029] random: nonblocking pool is initialized
nonblocking pool
btw is /dev/urandom
, which gives pseudo-random numbers when entropy pool is empty while /dev/random
blocks in case of empty pool, which is why this should be used for security relevant tasks (cryptography).[ 68.729362] Calibrating delay loop (skipped) preset value.. 4789.14 BogoMIPS (lpj=9578280)
root@VM-Buster:~# dmesg | grep random
[ 2.903387] random: get_random_bytes called from start_kernel+0x93/0x52c with crng_init=0
[ 6.765104] random: fast init done
[ 7.544067] random: systemd: uninitialized urandom read (16 bytes read)
[ 7.582841] random: systemd: uninitialized urandom read (16 bytes read)
[ 7.583323] random: systemd: uninitialized urandom read (16 bytes read)
[ 899.398354] random: crng init done
[ 899.398364] random: 7 urandom warning(s) missed due to ratelimiting
root@VM-Buster:~# dmesg | grep random
[ 1.458165] random: get_random_bytes called from start_kernel+0x93/0x52c with crng_init=0
[ 4.683344] random: fast init done
[ 5.481124] random: systemd: uninitialized urandom read (16 bytes read)
[ 5.524465] random: systemd: uninitialized urandom read (16 bytes read)
[ 5.524665] random: systemd: uninitialized urandom read (16 bytes read)
[ 10.011777] random: crng init done
[ 10.011780] random: 7 urandom warning(s) missed due to ratelimiting
Bug report on Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898814
So as can be derived above, the random seeding is done muuch faster at boot with haveged
. However on my test systems this has no effect on boot or service starts, even with php7.3-fpm installed. So the real question is in which cases boot or services wait for crng init done
and why nothing waits for it on my testing systems.
BOOM: I was always wondering why my Buster VM after most boots at first has no network access. It has but it waits very long on the first connection attempt. Afterwards everything works fine. Now I checked and indeed the download starts right after random: crng init done
! So on Buster network connections seem to wait for this. This error was already annoying me for a long while. Service starts worked very well, but only this network hang happens.
haveged
: 🈯️ Immediate network access!root@VM-Stretch:~# dmesg | grep random
[ 1.606379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.606712] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 1.606768] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[ 2.678331] random: fast init done
[ 1024.510594] random: crng init done
[ 1024.510796] random: 7 urandom warning(s) missed due to ratelimiting
[ 997.893151] random: crng init done
[ 997.893177] random: 7 urandom warning(s) missed due to ratelimiting
[ 209.549809] random: crng init done
[ 209.549830] random: 7 urandom warning(s) missed due to ratelimiting
[ 277.401283] random: crng init done
[ 277.401298] random: 7 urandom warning(s) missed due to ratelimiting
[ 803.553054] random: crng init done
[ 803.553078] random: 7 urandom warning(s) missed due to ratelimiting
[ 1052.447665] random: crng init done
[ 1052.447790] random: 7 urandom warning(s) missed due to ratelimiting
dietpi-drive_manager
:
[ 135.685130] random: crng init done
[ 135.685144] random: 7 urandom warning(s) missed due to ratelimiting
[ 812.582270] random: crng init done
[ 812.582299] random: 7 urandom warning(s) missed due to ratelimiting
[ 20.720902] random: crng init done
[ 20.720923] random: 7 urandom warning(s) missed due to ratelimiting
... I didn't want to wait any more and after some minutes just opened the display... no change...
... Waiting for another 10 seconds... no change...
... Pressed return for user name input... and just that moment, `crng init done` showed up.
[ 254.875494] random: crng init done
[ 254.875515] random: 7 urandom warning(s) missed due to ratelimiting
[ 853.799954] random: crng init done
[ 853.799976] random: 7 urandom warning(s) missed due to ratelimiting
php7.3-fpm.service
.php7.3-fpm.service
).crng init done
, found on Buster VirtualBox (kernel 4.19) but e.g. not on RPi with Raspbian Buster and/or kernel 4.19.Very nice article about entropy pool generation/usage in general:
Issues seems to be especially present on Raspberry Pies and virtual machines: https://lists.debian.org/debian-devel/2018/12/msg00293.html EDIT: Ah nope, Odroid mentioned above as well and my Notebook also has very late random init without haveged.
So if we add this by default, then only for those two deices for now.
What I aim to add as well is a log for the crng init done
event, although don't know yet how to track, perhaps udev can track it.
Then adding this (simple timestamp in seconds after boot) to dietpi-survey
and calculate average values for each device on report page. So we get an overview. Also DietPi-Banner MOTD can show a warning/hint when the value is e.g. > 10 seconds.
It is quite a pain that related issues are that drastic, hanging boot, failing services, and no real logs that allow to track down/debug the issue to entropy. This is nothing that I want to ship to our users, regardless of Debian arguments/security concerns.
Checked the bug reports. Many thanks again for all that send one after entropy being identified as issue!
All systemd have PHP7.3 installed, so indeed this seems to have a high demand of entropy.
But not all PHP7.3 instances are affected by service (re)start hangs.
Half of reported systems have OpenVPN installed, half unbound, half netfilter-persistant
(overlapping in cases). It seems that those as well have a high entropy demand, in case of OpenVPN expectedly. So these combinations seem to trigger the significant PHP7.3 hang on boot.
At least one system is not headless, has xserver + desktop installed. So we cannot assume that only headless systems are affected, even that as tested above keyboard input seems to full the entropy pool. We want the system + services to start without hanging until login prompt, so before any keyboard input.
Another issue is plymouth. It is installed on half affected systems as well and seems to cause the boot hanging, probably outside of PHP7.3. It contains of 13 static systemd units, being called on boot and shutdown/reboot/halt, invoking each other.
What especially took my attention is:
cat /lib/systemd/system/multi-user.target.wants/plymouth-quit-wait.service
[Unit]
Description=Hold until boot process finishes up
After=rc-local.service plymouth-start.service systemd-user-sessions.service
[Service]
ExecStart=-/bin/plymouth --wait
Type=oneshot
TimeoutSec=0
And finally when searching the forum for related posts: https://dietpi.com/phpbb/viewtopic.php?p=7598#p7598
WARN: Low entropy detected. We recommend you to install `haveged'
wlan1: interface state UNINITIALIZED->HT_SCAN
20/40 MHz operation not permitted on channel pri=1 sec=5 based on overlapping BSSes
Using interface wlan1 with hwaddr XX:XX:XX:XX:XX:XX and ssid "DIETPI"
random: Cannot read from /dev/random: Resource temporarily unavailable
random: Only 17/20 bytes of strong random data available from /dev/random
random: Not enough entropy pool available for secure operations
WPA: Not enough entropy in random pool for secure operations - update keys later when the first station connects
And not to forget general network hang on Buster until random crng init done.
Putting all together the causing and affected services are wide and related issues often critical from end user perspective. Although headless RPi systems are mostly affected, also Odroids, VMs and non-headless systems can be. And there is no reliable way to detect if the hardware is able to produce enough entropy and the kernel even does it, e.g. Debian default kernel does not trust CPU as entropy source (see above links). Issues are very hard to debug as mostly no related logs are produced besides the late random crng init that one must associate with the issues by hand.
=> We will install haveged on all DietPi systems from now on. The process takes less than 1M memory (the disk usage is again much smaller) and the benefit of assuring that none of the above issues occur simply outweighs it. => As said somewhere above, rng-tools is an alternative but it depends on the hardware ("some Intel/AMD/VIA chipsets") which we cannot rely on.
DietPi-PREP: https://github.com/MichaIng/DietPi/commit/66c32d39e5eb80fe0b22b22c20fa2e710ac1af44 DietPi-Patch: https://github.com/MichaIng/DietPi/commit/ed7221d4fe606a915a60ad0ae07abf5d1601d611 Changelog: https://github.com/MichaIng/DietPi/commit/333a3d823837a3f7d1f54228d94c8b3650c41b92
Just as I stumbled above:
Good to know its well known on Raspbian as well, leading to the fact that rng-tools
is pre-installed on Raspbian builds since April.
MicroSD card reader broke, I am probably not going to be working with Pi's and SBC's for a few weeks. Will check back in a few weeks...
ADMIN EDIT
To check if your system is affected
Workaround
dmesg | grep 'crng init'
after next reboot.Required Information
G_DIETPI_VERSION_CORE=6 G_DIETPI_VERSION_SUB=24 G_DIETPI_VERSION_RC=0 G_GITBRANCH='master' G_GITOWNER='MichaIng'
stretch Linux dietpi 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux RPi 3 Model B+ (armv7l) 5V 2.5 A (CE) SanDisk Ultra 64GB
Additional Information (if applicable)
php7.3-fpm
updated from php7.0-fpm (during diet-update)
can be replicated on fresh install
Steps to reproduce
Expected behaviour
php7.3-fpm should start within some ms (not minutes)
Actual behaviour
systemctl start php7.3-fpm
Extra details
seems to be an entropy-problem
see:
solution
sudo apt install haveged
see http://www.issihosts.com/haveged/
(the one and only alternative afaics would be downgrading kernels at the moment)
extra comment
probably affects some more services ...