Eraden / amdgpud

MIT License
193 stars 11 forks source link

LockExists on Boot #37

Closed VitaliyKulikov closed 2 years ago

VitaliyKulikov commented 2 years ago

I have strange behavior during boot with the new version. service auto restarting provides the same result, but the system reboot has been successful. it happened twice already. seems, it happens randomly during boot. no service crash after boot.

it was linux-kernel update before. i am using manjaro-unstable (archlinux-stable). but, strange that with the same software reboot helps after error during boot.

$ journalctl -xb -1 -u amdfand.service

Feb 12 10:29:16 dulia systemd[1]: Started AMD GPU fan daemon.
░░ Subject: A start job for unit amdfand.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://forum.manjaro.org/c/support
░░ 
░░ A start job for unit amdfand.service has finished successfully.
░░ 
░░ The job identifier is 103.
Feb 12 10:29:16 dulia amdfand[664]:  ERROR amdfand > LockExists
Feb 12 10:29:16 dulia systemd[1]: amdfand.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://forum.manjaro.org/c/support
░░ 
░░ An ExecStart= process belonging to unit amdfand.service has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
Feb 12 10:29:16 dulia systemd[1]: amdfand.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://forum.manjaro.org/c/support
░░ 
░░ The unit amdfand.service has entered the 'failed' state with result 'exit-code'.
Feb 12 10:29:20 dulia systemd[1]: amdfand.service: Scheduled restart job, restart counter is at 1.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://forum.manjaro.org/c/support
░░ 
░░ Automatic restarting of the unit amdfand.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Feb 12 10:29:20 dulia systemd[1]: Stopped AMD GPU fan daemon.
~❯ pacman -Qi amdfand-bin                                                                                                  
Name            : amdfand-bin
Version         : 1.0.9-3
~❯ inxi -G --display                                                                                                            
Graphics:
  Device-1: AMD Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] driver: amdgpu v: kernel
  Display: x11 server: X.Org 1.21.1.3 driver: loaded: amdgpu resolution: 1920x1080~60Hz
  OpenGL:
    renderer: AMD Radeon RX 550 / 550 Series (POLARIS12 DRM 3.44.0 5.16.9-1-MANJARO LLVM 13.0.0)
    v: 4.6 Mesa 21.3.5
Zile995 commented 2 years ago

I have the same problem on Arch. Also the configuration file /usr/lib/systemd/system/amdfand.service is marked executable. AUR package maintainer will probably fix this.

log.txt

Eraden commented 2 years ago

This is weird. Can you give me your output of this command:

systemctl status | grep amdfan

It seems you have more than 1 instances activated. Lock file should raise error only if there's pid file with PID which is alive.

Zile995 commented 2 years ago

Now, I don't have the problem. What I did:

systemctl.txt

It is really weird.

VitaliyKulikov commented 2 years ago

here boot log for service:

$ journalctl -xb -4 --no-pager | grep amdfand.service

Feb 11 14:50:26 dulia systemd[1]: Configuration file /usr/lib/systemd/system/amdfand.service is marked executable. Please remove executa
ble permission bits. Proceeding anyway.
░░ Subject: A start job for unit amdfand.service has finished successfully
░░ A start job for unit amdfand.service has finished successfully.
Feb 11 14:50:34 dulia systemd[1]: amdfand.service: Main process exited, code=exited, status=1/FAILURE
░░ An ExecStart= process belonging to unit amdfand.service has exited.
Feb 11 14:50:34 dulia systemd[1]: amdfand.service: Failed with result 'exit-code'.
░░ The unit amdfand.service has entered the 'failed' state with result 'exit-code'.
Feb 11 14:50:38 dulia systemd[1]: amdfand.service: Scheduled restart job, restart counter is at 1.
░░ Automatic restarting of the unit amdfand.service has been scheduled, as the result for
░░ Subject: A stop job for unit amdfand.service has finished
░░ A stop job for unit amdfand.service has finished.
░░ Subject: A start job for unit amdfand.service has finished successfully
░░ A start job for unit amdfand.service has finished successfully.
Feb 11 14:50:38 dulia systemd[1]: amdfand.service: Main process exited, code=exited, status=1/FAILURE
░░ An ExecStart= process belonging to unit amdfand.service has exited.
Feb 11 14:50:38 dulia systemd[1]: amdfand.service: Failed with result 'exit-code'.
░░ The unit amdfand.service has entered the 'failed' state with result 'exit-code'.
Feb 11 14:50:42 dulia systemd[1]: amdfand.service: Scheduled restart job, restart counter is at 2.
░░ Automatic restarting of the unit amdfand.service has been scheduled, as the result for
░░ Subject: A stop job for unit amdfand.service has finished
░░ A stop job for unit amdfand.service has finished.
░░ Subject: A start job for unit amdfand.service has finished successfully
░░ A start job for unit amdfand.service has finished successfully.
Feb 11 14:50:42 dulia systemd[1]: amdfand.service: Main process exited, code=exited, status=1/FAILURE
...

so, only one instance is going to start and restart. also, after such boot. i have seen pwm1_enable = 2

i have done 2 boots already without reinstalling. working fine. we can postpone this issue. i will provide more findings if any. thanks.

Eraden commented 2 years ago

I think I know what is the issue. Stored PID from previous run happens to be already used by other process in this run so this causes collision.

Maintainer of library didn't covered so I need to make additional check if name of process is the same

Eraden commented 2 years ago

I updated binary in release, AUR package is also updated. Please confirm this fixed issue

Zile995 commented 2 years ago

The issue seems to be solved.

I restarted the system several times, so far no problem. Usually it would fail at every boot session after some time. After 3 hours the service is still working.

Screenshot ![screenshot](https://user-images.githubusercontent.com/32335484/156620371-16d52178-0954-4634-b4f9-da3109b6dc60.png)
VitaliyKulikov commented 2 years ago

can confirm so far no problem.

Zile995 commented 2 years ago

Update, I have the problem again.

This time it's a little different. After 4 days of normal behavior, the service started failing again.

ERROR amdfand > MalformedPidFile(ParseIntError { kind: Empty })

log.txt systemctl.txt

Eraden commented 2 years ago

@Zile995 please check and confirm issue was fixed

Zile995 commented 2 years ago

amdfand cargo version 1.0.11

Screenshot ![Screenshot from 2022-04-03 17-42-24](https://user-images.githubusercontent.com/32335484/161436123-c8f4acb1-d21b-4efe-9911-e6665e692cd1.png)

journal.txt systemctl.txt

ERROR amdfand > MalformedPidFile(ParseIntError { kind: Empty })