downtownallday / cloudinabox

An installation of Nextcloud that borrows some of the "Mail-in-a-Box" code and standards
Other
2 stars 1 forks source link

FPM terminated at 03:00 every day, Let's Encrypt renewal also not working #12

Closed dumblob closed 2 years ago

dumblob commented 3 years ago

I've started to experience regular FPM service stops at exactly 03:00 in the morning every day some months ago (but didn't have enough time to document all the behavior until now). I'm running cloudinabox on Ubuntu 18.04.5 LTS.

I have to restart FPM systemd service manually every day, then restart nginx and then cd /usr/local/nextcloud/; sudo -u www-data php occ maintenance:mode --off because NC switches itself into maintenance on its own. See the following logs and information.

This is what NC web page then shows:

502 Bad Gateway
nginx
$ systemctl show php7.2-fpm.service
Type=notify
Restart=no
PIDFile=/run/php/php7.2-fpm.pid
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusText=Processes active: 0, idle: 0, Requests: 155, slow: 0, Traffic: 0req/sec
StatusErrno=0
Result=success
UID=[not set]
GID=[not set]
NRestarts=0
ExecMainStartTimestamp=Sun 2020-12-13 03:00:24 GMT
ExecMainStartTimestampMonotonic=4010117240499
ExecMainExitTimestamp=Mon 2020-12-14 03:00:05 GMT
ExecMainExitTimestampMonotonic=4096498494703
ExecMainPID=23310
ExecMainCode=1
ExecMainStatus=0
ExecStart={ path=/usr/sbin/php-fpm7.2 ; argv[]=/usr/sbin/php-fpm7.2 --nodaemonize --fpm-config /etc/php/7.2/fpm/php-fpm.conf ; ignore_errors=no ; start_time=[Sun 2020-12-13 03:00:24 GMT] ; stop_time=[Mon 2020-12-14 03:00:05 GMT] ; pid=23310 ; code=exited ; status=0 }
ExecReload={ path=/bin/kill ; argv[]=/bin/kill -USR2 $MAINPID ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
Slice=system.slice
MemoryCurrent=[not set]
CPUUsageNSec=[not set]
TasksCurrent=[not set]
IPIngressBytes=18446744073709551615
IPIngressPackets=18446744073709551615
IPEgressBytes=18446744073709551615
IPEgressPackets=18446744073709551615
Delegate=no
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=no
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=1108
IPAccounting=no
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=4096
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=3695
LimitNPROCSoft=3695
LimitMEMLOCK=16777216
LimitMEMLOCKSoft=16777216
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=3695
LimitSIGPENDINGSoft=3695
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
KillMode=control-group
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=php7.2-fpm.service
Names=php7.2-fpm.service
Requires=system.slice sysinit.target
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=shutdown.target multi-user.target
After=basic.target system.slice sysinit.target systemd-journald.socket network.target
Documentation=man:php-fpm7.2(8)
Description=The PHP 7.2 FastCGI Process Manager
LoadState=loaded
ActiveState=inactive
SubState=dead
FragmentPath=/lib/systemd/system/php7.2-fpm.service
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Mon 2020-12-14 03:00:05 GMT
StateChangeTimestampMonotonic=4096498494826
InactiveExitTimestamp=Sun 2020-12-13 03:00:24 GMT
InactiveExitTimestampMonotonic=4010117240560
ActiveEnterTimestamp=Sun 2020-12-13 03:00:24 GMT
ActiveEnterTimestampMonotonic=4010117577516
ActiveExitTimestamp=Mon 2020-12-14 03:00:05 GMT
ActiveExitTimestampMonotonic=4096498453213
InactiveEnterTimestamp=Mon 2020-12-14 03:00:05 GMT
InactiveEnterTimestampMonotonic=4096498494826
CanStart=yes
CanStop=yes
CanReload=yes
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Sun 2020-12-13 03:00:24 GMT
ConditionTimestampMonotonic=4010117236499
AssertTimestamp=Sun 2020-12-13 03:00:24 GMT
AssertTimestampMonotonic=4010117236500
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=2ca2d50c345e4748bc2755023a9eacb3
CollectMode=inactive
# tail -n 33 /var/log/php7.2-fpm.log
[08-Dec-2020 03:00:06] NOTICE: Terminating ...
[08-Dec-2020 03:00:06] NOTICE: exiting, bye-bye!
[08-Dec-2020 03:00:26] NOTICE: fpm is running, pid 1530
[08-Dec-2020 03:00:26] NOTICE: ready to handle connections
[08-Dec-2020 03:00:26] NOTICE: systemd monitor interval set to 10000ms
[09-Dec-2020 03:00:06] NOTICE: Terminating ...
[09-Dec-2020 03:00:06] NOTICE: exiting, bye-bye!
[09-Dec-2020 03:00:31] NOTICE: fpm is running, pid 19031
[09-Dec-2020 03:00:31] NOTICE: ready to handle connections
[09-Dec-2020 03:00:31] NOTICE: systemd monitor interval set to 10000ms
[10-Dec-2020 03:00:06] NOTICE: Terminating ...
[10-Dec-2020 03:00:06] NOTICE: exiting, bye-bye!
[10-Dec-2020 03:00:23] NOTICE: fpm is running, pid 3038
[10-Dec-2020 03:00:24] NOTICE: ready to handle connections
[10-Dec-2020 03:00:24] NOTICE: systemd monitor interval set to 10000ms
[11-Dec-2020 03:00:05] NOTICE: Terminating ...
[11-Dec-2020 03:00:05] NOTICE: exiting, bye-bye!
[11-Dec-2020 03:00:23] NOTICE: fpm is running, pid 21526
[11-Dec-2020 03:00:23] NOTICE: ready to handle connections
[11-Dec-2020 03:00:23] NOTICE: systemd monitor interval set to 10000ms
[12-Dec-2020 03:00:05] NOTICE: Terminating ...
[12-Dec-2020 03:00:05] NOTICE: exiting, bye-bye!
[12-Dec-2020 03:00:24] NOTICE: fpm is running, pid 7676
[12-Dec-2020 03:00:24] NOTICE: ready to handle connections
[12-Dec-2020 03:00:24] NOTICE: systemd monitor interval set to 10000ms
[13-Dec-2020 03:00:05] NOTICE: Terminating ...
[13-Dec-2020 03:00:05] NOTICE: exiting, bye-bye!
[13-Dec-2020 03:00:24] NOTICE: fpm is running, pid 23310
[13-Dec-2020 03:00:24] NOTICE: ready to handle connections
[13-Dec-2020 03:00:24] NOTICE: systemd monitor interval set to 10000ms
[13-Dec-2020 06:25:03] NOTICE: error log file re-opened
[14-Dec-2020 03:00:05] NOTICE: Terminating ...
[14-Dec-2020 03:00:05] NOTICE: exiting, bye-bye!

Then after FPM service restart:

...
[13-Dec-2020 06:25:03] NOTICE: error log file re-opened
[14-Dec-2020 03:00:05] NOTICE: Terminating ...
[14-Dec-2020 03:00:05] NOTICE: exiting, bye-bye!
[15-Dec-2020 11:33:55] NOTICE: fpm is running, pid 23154
[15-Dec-2020 11:33:55] NOTICE: ready to handle connections
[15-Dec-2020 11:33:55] NOTICE: systemd monitor interval set to 10000ms

NextCloud page then shows:

Maintenance mode
This Test00 instance is currently in maintenance mode, which may take a while. This page will refresh itself when the instance is available again.

Contact your system administrator if this message persists or appeared unexpectedly.
$ cd /usr/local/nextcloud/
$ sudo -u www-data php occ maintenance:mode --off

Then everything works for few hours until 03:00 in the morning :cry:.

Note that after some time, the automatically requested Let's Encrypt certificate did not renew and NC is thus anyway not accessible. I didn't have time to look into this, but I suspect it's related to the issue above and I'd thus rather first solved the issue above and then got back to Let's Encrypt.

Any thoughts what could be the issue and how to debug it?

downtownallday commented 3 years ago

I would investigate backups. At 3:00am, the management/daily_tasks.sh script is executed, which performs a backup first, then certificate provisioning second. And before backing up, php-fpm is terminated and maintenance mode turned on. Since you're starting those manually every day, there's a good chance the backup.py script hasn't finished backing up files yet, for whatever reason. You could run management/backup.py directly from the command line and see if there are any messages or errors. Also look in the system process list for daily_task.sh and/or backup.py processes.

downtownallday commented 2 years ago

stale - closing

dumblob commented 2 years ago

Yeah, believe it or not but I did not find enough time to investigate it further. If this'll happen again, I'll reopen.

Thanks for the suggestion of how to debug it!