PiSupply / PaPiRus

Resources for PaPiRus ePaper eInk displays
https://www.pi-supply.com/product/papirus-epaper-eink-screen-hat-for-raspberry-pi/
Other
346 stars 88 forks source link

epd-fuse.service sometimes crashes on boot #165

Closed joelspadin closed 6 years ago

joelspadin commented 6 years ago

I am using a Rapsberry Pi Zero with the Papirus Zero hat, and I've installed using the auto installation instructions. Sometimes after rebooting, I will get an error such as the following when I try to use the display:

pi@raspberrypi:~ $ papirus-clear
Traceback (most recent call last):
  File "/usr/local/bin/papirus-clear", line 31, in <module>
    main()
  File "/usr/local/bin/papirus-clear", line 25, in main
    papirus = Papirus()
  File "/usr/local/lib/python3.6/site-packages/papirus/epd.py", line 83, in __init__
    with open(os.path.join(self._epd_path, 'version')) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/dev/epd/version'

It looks like this is because the epd-fuse service has crashed:

pi@raspberrypi:~ $ systemctl status epd-fuse ● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Mon 2017-11-20 00:38:49 CST; 14min ago Main PID: 306 (code=killed, signal=SEGV)

Nov 20 00:38:43 raspberrypi systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Nov 20 00:38:44 raspberrypi systemd[1]: Started A Fuse driver for the Repaper.org EPD. Nov 20 00:38:49 raspberrypi systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Nov 20 00:38:49 raspberrypi systemd[1]: epd-fuse.service: Unit entered failed state. Nov 20 00:38:49 raspberrypi systemd[1]: epd-fuse.service: Failed with result 'signal'.

If I restart the service, everything starts working again. Is there any way I can figure out why the service randomly crashes on some boots and not others?

tvoverbeek commented 6 years ago

After a crash try if the command sudo journalctl -u epd-fuse gives some clues where exactly the service crashes. The journalctl command outputs the stdout and stderr of the service.

joelspadin commented 6 years ago

I did try journalctl, and it just printed out the second half of the "systemctl status" output. It was after a boot where epd-fuse did start correctly though. I'll try it again the next time it breaks to see if I can get anything more helpful.

tvoverbeek commented 6 years ago

Yes, I am interested in the output or journalctl after a crash.

joelspadin commented 6 years ago

There doesn't appear to be any additional information:

pi@raspberrypi:~ $ sudo systemctl is-active epd-fuse failed pi@raspberrypi:~ $ sudo journalctl -u epd-fuse -- Logs begin at Mon 2017-11-20 01:09:08 CST, end at Mon 2017-11-20 18:52:56 CST. -- Nov 20 01:09:18 raspberrypi systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Nov 20 01:09:19 raspberrypi systemd[1]: Started A Fuse driver for the Repaper.org EPD. Nov 20 01:09:24 raspberrypi systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Nov 20 01:09:24 raspberrypi systemd[1]: epd-fuse.service: Unit entered failed state. Nov 20 01:09:24 raspberrypi systemd[1]: epd-fuse.service: Failed with result 'signal'.

westy48 commented 6 years ago

I'm also sometimes having to restart the EPD.

As suggested to @ChaosinaCan I've run sudo journalctl -u epd-fuse and got the following output:

-- Logs begin at Wed 2017-12-06 01:17:01 UTC, end at Wed 2017-12-06 10:29:27 UTC. -- Dec 06 01:23:28 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 06 01:23:29 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 06 09:54:53 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 06 09:54:53 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 06 09:54:53 santa systemd[1]: epd-fuse.service: Failed with result 'signal'. Dec 06 09:56:08 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 06 09:56:08 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. ~

Cheers,

James

francesco-vannini commented 6 years ago

Hi @tvoverbeek have you any ideas on how to fix this?

tvoverbeek commented 6 years ago

Signal 11 (SEGV) usually is an illegal memory access (due to e.g. addressing array out of bounds, using an uninitialized variable, etc.) Problem is to find where the illegal access is generated, in the epd-fuse code or in the library lib-fuse.

francesco-vannini commented 6 years ago

If there is anyone who can that'll be you :)

joelspadin commented 6 years ago

Can epd-fuse be configured to save a memory dump on a crash? That should make it easier to find the source of the problem, especially if you aren't able to reproduce it.

tvoverbeek commented 6 years ago

@ChaosinaCan, @westy48 : The following should (I hope) generate a core file (memory dump) when epd-fuse aborts with SEGV. In /etc/security/limits.conf add the following lines:

*       hard    core        unlimited
*       soft    core        unlimited

Use sudo to edit the file. Reboot. epd-fuse should generate a core file in its working directory which is the root directory (/) when it crashes. Hopefully this works, so we can find out where and why the crash occurs

westy48 commented 6 years ago

@tvoverbeek thanks for this - I've edited /etc/security/limits.conf as suggested above. When you say the core file will be made in the root directory, how would I navigate to that in the terminal? Is it cd /?

tvoverbeek commented 6 years ago

@westy48 Yes, cd / is the correct command. You might have to use sudo to see the files, e.g. sudo ls -al /. This command will list the files in / even if you are in a different directory.

westy48 commented 6 years ago

@tvoverbeek Hi Ton,

When I do sudo ls -al /this is what I see:

pi@santa:~ $ sudo ls -al / total 79 drwxr-xr-x 21 root root 4096 Nov 29 02:56 . drwxr-xr-x 21 root root 4096 Nov 29 02:56 .. drwxr-xr-x 2 root root 4096 Nov 29 01:45 bin drwxr-xr-x 3 root root 2560 Jan 1 1970 boot drwxr-xr-x 15 root root 3380 Dec 8 15:54 dev drwxr-xr-x 111 root root 4096 Dec 5 10:57 etc drwxr-xr-x 3 root root 4096 Nov 29 01:22 home drwxr-xr-x 16 root root 4096 Nov 29 01:38 lib drwx------ 2 root root 16384 Nov 29 02:49 lost+found drwxr-xr-x 2 root root 4096 Nov 29 01:06 media drwxr-xr-x 2 root root 4096 Nov 29 01:06 mnt drwxr-xr-x 7 root root 4096 Nov 29 02:32 opt dr-xr-xr-x 135 root root 0 Jan 1 1970 proc drwx------ 5 root root 4096 Dec 7 20:57 root drwxr-xr-x 27 root root 800 Dec 8 15:54 run drwxr-xr-x 2 root root 4096 Nov 29 01:46 sbin drwxr-xr-x 2 root root 4096 Nov 29 01:06 srv dr-xr-xr-x 12 root root 0 Dec 8 15:56 sys drwxrwxrwt 11 root root 4096 Dec 8 15:54 tmp drwxr-xr-x 11 root root 4096 Nov 29 02:09 usr drwxr-xr-x 11 root root 4096 Nov 29 02:56 var pi@santa:~ $

Will I find the core file in one of these directories? And is it called something that will make me think "That's a core file!"

Sorry for all the questions - I'm just a step or two beyond the limits of my knowledge!

tvoverbeek commented 6 years ago

A core file is named (surprise) core. If you want to know if there is a file named core in / just do ls -al /core. If not there you will get ls: cannot access '/core': No such file or directory. But first you will have to wait till epd-fuse crashes.

westy48 commented 6 years ago

Thanks again @tvoverbeek. I've just done sudo journalctl -u epd-fuse which returned this:

-- Logs begin at Fri 2017-12-08 18:04:00 UTC, end at Fri 2017-12-08 18:06:13 UTC. -- Dec 08 18:04:12 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 08 18:04:12 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 08 18:04:54 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 08 18:04:54 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 08 18:04:54 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

Then, ls -al /core gives me this:

ls: cannot access '/core': No such file or directory

I'm doing this via SSH if that makes a difference.

tvoverbeek commented 6 years ago

You did reboot after editing /etc/security/limits.conf? You could also try ls -al /core* and ls -al /root/core*.

tvoverbeek commented 6 years ago

Also in a terminal session use the command ulimit -c. Should give unlimited as response after editing /etc/security/limits/conf and rebooting. The default response in raspbian is 0, meaning no core dumps.

westy48 commented 6 years ago

ulimit -c does return unlimited

After an epd-fuse crash ls -al /core* gives me ls: cannot access '/core*': No such file or directory and ls -al /root/core* gives ls: cannot access '/root/core*': Permission denied

tvoverbeek commented 6 years ago

For the last one (since you got Permission denied) retry with sudo ls -al /root/core*

tvoverbeek commented 6 years ago

After some more reading it turns out getting core dumps by systemctl services is a bit more complicated. Install the systemd-coredump package: sudo apt-get install systemd-coredump. Probably best to do a sudo apt-get update first. Then reboot and wait for the next epd-fuse crash. If I understood the docs then sudo journalctl -u epd-fuse should indicate a core has been created and also include a backtrace. Also coredumpctl list should give a list of all coredumps in the journal. Hope this brings us a bit further.

westy48 commented 6 years ago

pi@santa:~ $ sudo journalctl -u epd-fuse -- Logs begin at Thu 2016-11-03 17:16:43 UTC, end at Sat 2017-12-09 09:21:26 UTC. -- Dec 09 09:17:44 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 09 09:17:45 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 09 09:21:20 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 09 09:21:20 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 09 09:21:20 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

And...

pi@santa:~ $ coredumpctl list TIME PID UID GID SIG COREFILE EXE Sat 2017-12-09 09:21:21 UTC 347 0 0 11 error /usr/sbin/epd_fuse

tvoverbeek commented 6 years ago

@westy48 OK, so we got a coredump. The coredump file should be in /var/lib/systemd/coredump. What is the output from ls -al /var/lib/systemd/coredump? If there is a coredump file there can you attach it to the reply?

westy48 commented 6 years ago

Hi Ton, here's a coredump file I got.

I've zipped it up 'cos Github wouldn't let me load it as a .lz4 file - I hope that's OK.

core.epd_fuse.0.78358d7c61074c3da5dd16ec6f875da5.344.1512841144000000000000.lz4.zip

tvoverbeek commented 6 years ago

James (@westy48 Zipped was OK. The crash occurred at the return from function EPD_end in epd_fuse. Smells like the stack was corrupted somewhere so the return from EPD_end() went nowhere. From the core file I saw it was running on 'v6l', i.e Pi Zero or the older RPIs before the multi core PIs. Did you run the install on the same Pi or was the SD card moved from an other (multicore) Pi? Maybe a red herring, but could you try to reinstall the PaPiRus software on the same Pi where you are running your PaPiRus app (without changing the SD card between install and running the app of course)

westy48 commented 6 years ago

@tvoverbeek Ho Ton - thanks for your persistence with this!

It is a Pi Zero W, which is the machine I initially installed the PaPiRus software onto. The display is a 2 inch PaPiRus Zero. I used a newly formatted SD card, flashed Scratch onto it with Etcher and then used your curl -sSL https://pisupp.ly/papiruscode | sudo bash installer. It was actually a new card, so I've never had it in another Pi.

I've re-run the installer, and restarted my code. It ran fine, but the crash has always seemed to happen when the Pi shuts down (or re-starts - I'm not certain...?) so I rebooted it and sure enough it then needed the EPD driver restarting.

This also generated a new coredump file, which I've attached for your reading pleasure :-) core.epd_fuse.0.d271b796f2eb480faf49a0703ee408e0.345.1513005918000000000000.lz4.zip

tvoverbeek commented 6 years ago

@westy48 James, My previous backtrace was against a modified version of epd_fuse, thus not valid. Against epd_fuse as installed I get a more meaningful backtrace (identical for both core files you provided):

Core was generated by `/usr/sbin/epd_fuse --panel=2.0 -o allow_other -o default_permissions /dev/epd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00014508 in EPD_set_temperature (epd=0x0, temperature=25) at V231_G2/epd.c:492
492     epd->factored_stage_time = epd->base_stage_time * temperature_to_factor_10x(temperature) / 10;
[Current thread is 1 (Thread 0xb62ff470 (LWP 347))]
(gdb) bt
#0  0x00014508 in EPD_set_temperature (epd=0x0, temperature=25) at V231_G2/epd.c:492
#1  0x00012604 in run_command (c=85 'U') at epd_fuse.c:623
#2  0x00011f30 in display_write (path=0xb59004f0 "/command", buffer=0xb64a5058 "U", size=1, offset=0, fi=0xb62fed68)
    at epd_fuse.c:424
#3  0xb6e85a44 in fuse_fs_write_buf () from /lib/arm-linux-gnueabihf/libfuse.so.2
#4  0xb6e85c00 in ?? () from /lib/arm-linux-gnueabihf/libfuse.so.2
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The pointer to the epd structure is NULL, hence the crash. This happened during a papirus.update() command (the 'U' argument in display_write). Do you have some program running at boot (or shutdown) which calls papirus.update()? epd_fuse did probably initialize (otherwise display_write could not be called), but maybe the initialization of the SPI device did fail (e.g. /dev/spidev0.0 not created yet). A failed SPI init also results in a failure of creating the epd structure. Otherwise it could be a race condition between epd_fuse and the spi device creation.

joelspadin commented 6 years ago

I haven't had the time to reproduce this again yet, but the errors were happening for me before I had any sort of programs using the display running at boot. I was simply connecting via SSH and trying to run either my own scripts or the sample programs.

westy48 commented 6 years ago

Do you have some program running at boot (or shutdown) which calls papirus.update()?

No. On start up cron starts a shell script which launches my python code - none of this calls papirus.update()

tvoverbeek commented 6 years ago

@westy48 @ChaosinaCan So it seems to crash before you run any papirus related program. I have attached a zipped version of the epd_fuse binary with debug output which should tell us if it crashes when starting or later. Unzip it and copy/move the resulting epd_fuse to /usr/sbin after saving the original epd_fuse:

sudo mv /usr/sbin/epd_fuse /usr/sbin/epd_fuse.save
sudo cp epd_fuse /usr/sbin/epd_fuse
sudo chmod 755 /usr/sbin/epd_fuse

Reboot. When it does not crash the output from systemctl status epd-fuse after booting should look like this:

pi@papirus-zero-w:~ $ systemctl status epd-fuse
● epd-fuse.service - A Fuse driver for the Repaper.org EPD
   Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2017-12-12 21:11:21 CET; 43s ago
  Process: 438 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS)
  Process: 430 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS)
 Main PID: 444 (epd_fuse)
   CGroup: /system.slice/epd-fuse.service
           └─444 /usr/sbin/epd_fuse --panel=2.0 -o allow_other -o default_permissions /dev/epd

Dec 12 21:11:20 papirus-zero-w systemd[1]: Starting A Fuse driver for the Repaper.org EPD...
Dec 12 21:11:20 papirus-zero-w epd_fuse[444]: EPD_create start
Dec 12 21:11:20 papirus-zero-w epd_fuse[444]: EPD_set_temperature: epd=b63005c0, temp=25
Dec 12 21:11:21 papirus-zero-w epd_fuse[444]: EPD_create normal return, epd=b63005c0
Dec 12 21:11:21 papirus-zero-w systemd[1]: Started A Fuse driver for the Repaper.org EPD.
pi@papirus-zero-w:~

Curious about the output of systemctl status epd-fuse after a crash.

epd_fuse.zip

westy48 commented 6 years ago

Fab, thanks @tvoverbeek - I've had a got at this...

pi@santa:~ $ systemctl status epd-fuse ● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Tue 2017-12-12 21:56:28 UTC; 42s ago Process: 676 ExecStop=/usr/bin/env umount /dev/epd (code=exited, status=0/SUCCESS) Process: 344 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/ Process: 333 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Main PID: 346 (code=killed, signal=SEGV)

Dec 12 21:56:00 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 12 21:56:01 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 12 21:56:27 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 12 21:56:28 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 12 21:56:28 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

...(edit) and another one after I started up this morning.

● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Wed 2017-12-13 07:54:42 UTC; 1min 2s ago Process: 752 ExecStop=/usr/bin/env umount /dev/epd (code=exited, status=0/SUCCESS) Process: 344 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Process: 338 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Main PID: 347 (code=killed, signal=SEGV)

Dec 12 23:55:19 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 12 23:55:20 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 13 07:54:40 santa epd_fuse[347]: EPD_set_temperature: epd=0, temp=17 Dec 13 07:54:41 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 13 07:54:42 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 13 07:54:42 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

tvoverbeek commented 6 years ago

@westy48 The 2nd one from today is interesting. Obviously EPD_create was not called from display_init. The first time your python script called image.write (line 17 in santa.py) at 07:54 epd_fuse crashes because epd=0. See the EPD_set_temperature line. (looks like your pi zero is outside, only 17 degrees). image.write calls papirus.update to show the image on the display. So probably something goes wrong in the display_init function of epd_fuse.c. I attach a 2nd debug version of epd_fuse (epd_fuse2.zip) with additional output in display_init. Could you try this one? You probably will have to kill the epd_fuse service before you can copy this debug version to /usr/sbin/epd_fuse. After copying, reboot to be sure the new version is used by the system. My successful output now looks like this:

pi@papirus-zero-w:~ $ systemctl status epd-fuse
● epd-fuse.service - A Fuse driver for the Repaper.org EPD
   Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2017-12-13 14:24:12 CET; 1min 15s ago
  Process: 439 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS)
  Process: 432 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS)
 Main PID: 447 (epd_fuse)
   CGroup: /system.slice/epd-fuse.service
           └─447 /usr/sbin/epd_fuse --panel=2.0 -o allow_other -o default_permissions /dev/epd

Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: display_init: Start
Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: display_init: GPIO_setup OK
Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: display_init: SPI_create OK
Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: display_init: Call EPD_create
Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: EPD_create: Start
Dec 13 14:24:11 papirus-zero-w epd_fuse[447]: EPD_set_temperature: epd=b64005a0, temp=25
Dec 13 14:24:12 papirus-zero-w epd_fuse[447]: EPD_create: Normal return, epd=b64005a0
Dec 13 14:24:12 papirus-zero-w epd_fuse[447]: diplay_init: Normal return, epd = b64005a0
Dec 13 14:24:12 papirus-zero-w systemd[1]: Started A Fuse driver for the Repaper.org EPD.
Dec 13 14:25:21 papirus-zero-w epd_fuse[447]: EPD_set_temperature: epd=b64005a0, temp=30
pi@papirus-zero-w:~

Waiting for more systemctl status epd-fuse output. Maybe the gpio setup in epd_fuse and gpiozero do not go very well together. epd_fuse2.zip

westy48 commented 6 years ago

OK - I think I've got that right...

pi@santa:~ $ systemctl status epd-fuse ● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Thu 2017-12-14 07:43:50 UTC; 9s ago Process: 767 ExecStop=/usr/bin/env umount /dev/epd (code=exited, status=0/SUCCESS) Process: 341 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Process: 329 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Main PID: 344 (code=killed, signal=SEGV)

Dec 14 07:42:38 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 14 07:42:39 santa epd_fuse[344]: display_init: Start Dec 14 07:42:39 santa epd_fuse[344]: display_init: GPIO_setup OK Dec 14 07:42:39 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 14 07:43:50 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 14 07:43:50 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 14 07:43:50 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

tvoverbeek commented 6 years ago

@westy48 James, Looks like the opening of the SPI device (/dev/spidev0.0) in display_init fails. When I run a cron @reboot job with the SPI_create code it fails on a cold boot, but succeeds on a reboot. Seems to be a timing issue with the creation of the spidev device vs starting the epd_fuse service. To confirm SPI_create failing is the cause of the crashes I have attached a 3rd version of epd_fuse (epd_fuse3.zip) with more debug output in SPI_create. Could you try this one again and get the systemctl status epd-fuse output in the case of a crash? Thanks in advance.

epd_fuse3.zip

westy48 commented 6 years ago

pi@santa:~ $ systemctl status epd-fuse ● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Sat 2017-12-16 09:10:57 UTC; 25s ago Process: 687 ExecStop=/usr/bin/env umount /dev/epd (code=exited, status=0/SUCCESS) Process: 337 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Process: 328 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Main PID: 342 (code=killed, signal=SEGV)

Dec 16 09:10:32 santa systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 16 09:10:32 santa epd_fuse[342]: display_init: Start Dec 16 09:10:32 santa epd_fuse[342]: display_init: GPIO_setup OK Dec 16 09:10:32 santa epd_fuse[342]: SPI_create: Cannot open /dev/spidev0.0 Dec 16 09:10:32 santa epd_fuse[342]: display_init: SPI_create failed Dec 16 09:10:33 santa systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 16 09:10:57 santa systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 16 09:10:57 santa systemd[1]: epd-fuse.service: Unit entered failed state. Dec 16 09:10:57 santa systemd[1]: epd-fuse.service: Failed with result 'signal'.

tvoverbeek commented 6 years ago

@westy48 James, Thanks a lot for running all the debug versions of epd_fuse. Now we know for sure it is the non availability of /dev/spidev0.0 when epd_fuse starts is the cause of the crashes which occur when you run a papirus program later. Next is what to do about it. I have some ideas, stay tuned.

tvoverbeek commented 6 years ago

@westy48 @ChaosinaCan James, Joel,

Could you try the following:

  1. create the file /lib/udev/rules.d/98-spidev.rules with the following contents:

    KERNEL=="spidev0.0", SUBSYSTEM=="spidev", TAG+="systemd"

    This will generate a device target for spidev0.0 which epd-fuse can wait for.

  2. In /usr/lib/systemd/system/epd-fuse.service change the line

    After=network.target dev-serial1.device

    to

    After=network.target dev-serial1.device dev-spidev0.0.device

    and reboot

Continue running with the debug version of epd_fuse and see it it still crashes or not. Hopefully epd_fuse will not crash, but ...

joelspadin commented 6 years ago

I think I've followed everything correctly, but it's crashing pretty consistently now.

pi@raspberrypi:~ $ cat /lib/udev/rules.d/98-spidev.rules KERNEL=="spidev0.0", SUBSYSTEM=="spidev", TAG+="systemd"

pi@raspberrypi:~ $ cat /usr/lib/systemd/system/epd-fuse.service [Unit] Description=A Fuse driver for the Repaper.org EPD After=network.target dev-serial1.device dev-spidev0.0.device

[Service] Type=forking PrivateDevices=no SyslogLevel=err

# make sure panel size matches installed EPD panel Environment=EPD_MOUNTPOINT=/dev/epd EPD_SIZE=2.0 "EPD_OPTS=-o allow_other -o default_permissions" EnvironmentFile=/etc/default/epd-fuse ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} ExecStop=/usr/bin/env umount /dev/epd KillSignal=SIGQUIT KillMode=mixed

[Install] WantedBy=multi-user.target

pi@raspberrypi:~ $ systemctl status epd-fuse ● epd-fuse.service - A Fuse driver for the Repaper.org EPD Loaded: loaded (/usr/lib/systemd/system/epd-fuse.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Sat 2017-12-16 19:43:33 CST; 45s ago Process: 434 ExecStop=/usr/bin/env umount /dev/epd (code=exited, status=0/SUCCESS) Process: 301 ExecStart=/usr/sbin/epd_fuse --panel=${EPD_SIZE} $EPD_OPTS ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Process: 283 ExecStartPre=/usr/bin/env mkdir -p ${EPD_MOUNTPOINT} (code=exited, status=0/SUCCESS) Main PID: 310 (code=killed, signal=SEGV)

Dec 16 19:43:26 raspberrypi systemd[1]: Starting A Fuse driver for the Repaper.org EPD... Dec 16 19:43:27 raspberrypi epd_fuse[310]: display_init: Start Dec 16 19:43:27 raspberrypi epd_fuse[310]: display_init: GPIO_setup OK Dec 16 19:43:27 raspberrypi epd_fuse[310]: SPI_create: Cannot open /dev/spidev0.0 Dec 16 19:43:27 raspberrypi epd_fuse[310]: display_init: SPI_create failed Dec 16 19:43:27 raspberrypi systemd[1]: Started A Fuse driver for the Repaper.org EPD. Dec 16 19:43:32 raspberrypi systemd[1]: epd-fuse.service: Main process exited, code=killed, status=11/SEGV Dec 16 19:43:33 raspberrypi systemd[1]: epd-fuse.service: Unit entered failed state. Dec 16 19:43:33 raspberrypi systemd[1]: epd-fuse.service: Failed with result 'signal'.

pi@raspberrypi:~ $ systemctl status dev.spidev0.0.device rypi:~ $ systemctl status dev-spidev0.0.device ● dev-spidev0.0.device - /dev/spidev0.0 Follow: unit currently follows state of sys-devices-platform-soc-20204000.spi-spi_master-spi0-spi0.0-spidev-spidev0.0.device Loaded: loaded Active: active (plugged) since Sat 2017-12-16 19:43:28 CST; 2min 20s ago Device: /sys/devices/platform/soc/20204000.spi/spi_master/spi0/spi0.0/spidev/spidev0.0

And here's the core dump:

core.epd_fuse.0.db211fe02bdb45d48aa9ec3a3e5baa01.304.1513472508000000000000.lz4.zip

tvoverbeek commented 6 years ago

@ChaosinaCan Joel,

Thanks for testing. Obviously the After=dev-spidev0.0.device is not sufficient. In your case spidev0.0 is available 2 seconds after epd-fuse is started. I do not know why the spidev creation and epd-fuse starting are so close for you. For me on a pi-zero-w epd-fuse does not start until 10 seconds after spidev creation, hence I do not have the problem. However I am using WiFi and dhcp. Therefore it takes a while before the network.target is reached. I guess you have a different net config (fixed ethernet?, fixed IP address?) so network.target is reached much earlier.

I tested something which should work better. Modify the Unit section of /usr/lib/systemd/system/epd-fuse.service so it looks as follows:

[Unit]
Description=A Fuse driver for the Repaper.org EPD
After=network.target dev-serial1.device dev-spidev0.0.device
BindsTo=dev-spidev0.0.device

This should delay the start of epd-fuse until spidev0.0 is active. If this does not work, we'll have to do it the brute force way, i.e retrying the open of /dev/spidev0.0 a few times with pauses of about a second in between.

PS Do not need the core file if you see raspberrypi epd_fuse[310]: SPI_create: Cannot open /dev/spidev0.0 in the systemctl status epd-fuse output. If it still crashes without this 'Cannot open' message, then I am interested in the core file.

Hopefully we can fix this before Christmas

joelspadin commented 6 years ago

I think that worked! I went from five consecutive failures before adding the "BindsTo" line to five consecutive, successful reboots.

Your theory on why my timing is different also sounds plausible. I plan to use my Pi in a scenario where it won't have Internet access, so I configured boot to not wait on the network.

tvoverbeek commented 6 years ago

Joel (@ChaosinaCan),

Thanks for the quick turn-around, Now waiting if it also works for James (@westy48). If so, i'll update the repaper repository.

shawaj commented 6 years ago

@tvoverbeek do you have access to the repaper repo?

If not, give me a shout and I'll approve the PR immediately once you submit it.

Would give you write access but it's not me that can do that!

On 17 Dec 2017 20:18, "Ton van Overbeek" notifications@github.com wrote:

Joel (@ChaosinaCan https://github.com/chaosinacan),

Thanks for the quick turn-around, Now waiting if it also works for James (@westy48 https://github.com/westy48). If so, i'll update the repaper repository.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PiSupply/PaPiRus/issues/165#issuecomment-352282264, or mute the thread https://github.com/notifications/unsubscribe-auth/ADNCuleB7oU-HZPNnS9LVTOjVgkHZw9rks5tBXcLgaJpZM4Qj0sO .

westy48 commented 6 years ago

Hi Ton @tvoverbeek - I've quickly applied these this morning and did a couple of successful reboots. I'll be able to give it a more thorough testing later tonight, but looking good so far.

westy48 commented 6 years ago

Ton (@tvoverbeek) - thanks for your work and patience on this. I looks to me like this has resolved it now.

tvoverbeek commented 6 years ago

Joel (@ChaosinaCan), James (@westy48) Thanks again for all your help finding the bug and testing the solution. Submitted pull request for the repaper/gratis repository (PR #80). Aaron (@shawaj) could you merge this as I do not have write access to the repaper/gratis repository?

francesco-vannini commented 6 years ago

Thanks guys this is probably the longest issue exchange in the history of PaPiRus. Fantastic job!

shawaj commented 6 years ago

@tvoverbeek all done on the gratis repo.

tvoverbeek commented 6 years ago

@shawaj Thanks. Closing issue.

joelspadin commented 6 years ago

@tvoverbeek and thank you for the fix!