AllskyTeam / allsky

A Raspberry Pi operated Wireless Allsky Camera
MIT License
1.19k stars 180 forks source link

PiHq stops capture after update #323

Closed f29pc closed 3 years ago

f29pc commented 3 years ago

I did a fresh install of Rasbian on a 128GB Scandisk Extreme, on a Rpi 4,(4gb) RpiHq. I had everything running with and earlier install but had the "no timelaps" issue. Now with a fresh install of Allsky with the config that has the resize setting for the timelaps, everything starts out ok but after 2 to 3 hrs (day or night settings) it stops capturing. Allsky seems to be running, I can log in via the GUI, change the camera settings, view the system info (shows 52% CPU load every time it stops capturing) If I look at the device manager (pi desktop) it shows less than 4%. The Pi is not locked up and the log does not show anything except for capturing with the time the last image was taken. Stopping and restarting Allsky makes no difference. I can reboot the Pi and everything starts ok but then stops capturing after 2 to 3 hrs. I have the same ver running on a Pi3B+ with a ZWO120MC -s that has been working fine since the update... I'm a noob to the Allsky and love it, just don't know what to try next. Any sugestions??? Thanks

paolobar54 commented 3 years ago

Same problem here: Just new to Allsky, done a fresh install on a Rpi4 4GB and RpiHQ, 32GB SD. Using the GUI is ok and the night and day capturing is working, but never been able to reach the end of a night. Of two attempts the first stopped just before 1AM and the other at 5AM. No strange messages on the log, the GUI continue to work. No ftp, web server or other add-on, just the allsky and GUI install. Any possible idea for debugging?

Thanks in advance Paolo

pclanon commented 3 years ago

I'm getting the same behavior all of a sudden on an RPi 4, 4GB, RPi-HQ camera, updated and upgraded. Allsky crashes unexpectedly and doesn't recover. I thought it might be related to power issues (I operate overnight on a rechargeable battery), but same behavior when plugged into the grid inside the house. Syslog always spits out something nearly identical to this when the crash happens, but I don't have the skills to interpret:

Feb 14 07:53:45 allsky allsky.sh[648]: Capturing & saving image... Feb 14 07:53:45 allsky allsky.sh[648]: Capture command: nice raspistill --nopreview --thumb none --output image.jpg --burst -st --mode 3 --exposure auto --analoggain 1 --awb auto --vflip --saturation 50 --quality 95 -a 1104 -a 1036 -a "San Francisco, CA" -ae 32,0xff,0x808000 Feb 14 07:53:45 allsky allsky.sh[648]: Capturing & saving image done, now wait 30 seconds... Feb 14 07:54:33 allsky kernel: [67546.764067] ------------[ cut here ]------------ Feb 14 07:54:33 allsky kernel: [67546.764098] WARNING: CPU: 0 PID: 22681 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128 Feb 14 07:54:33 allsky kernel: [67546.764108] Firmware transaction timeout Feb 14 07:54:33 allsky kernel: [67546.764117] Modules linked in: cmac bnep hci_uart btbcm bluetooth ecdh_generic ecc 8021q garp stp llc brcmfmac bcm2835_codec(C) brcmutil v3d v4l2_mem2mem bcm2835_isp(C) bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops sha256_generic videobuf2_v4l2 videobuf2_common raspberrypi_hwmon cfg80211 vc4 rfkill cec videodev drm_kms_helper gpu_sched mc vc_sm_cma(C) drm drm_panel_orientation_quirks rpivid_mem snd_bcm2835(C) snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq uio nvmem_rmem ip_tables x_tables ipv6 Feb 14 07:54:33 allsky kernel: [67546.764661] CPU: 0 PID: 22681 Comm: kworker/0:2 Tainted: G C 5.10.11-v7l+ #1399 Feb 14 07:54:33 allsky kernel: [67546.764667] Hardware name: BCM2711

paolobar54 commented 3 years ago

Just the same here

Feb 14 15:06:55 allsky kernel: [ 281.907599] ------------[ cut here ]------------ Feb 14 15:06:55 allsky kernel: [ 281.907637] WARNING: CPU: 0 PID: 131 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128 Feb 14 15:06:55 allsky kernel: [ 281.907649] Firmware transaction timeout Feb 14 15:06:55 allsky kernel: [ 281.907659] Modules linked in: cmac rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc vc4 cec v3d drm_kms_helper gpu_sched brcmfmac brcmutil drm sha256_generic raspberrypi_hwmon drm_panel_orientation_quirks cfg80211 rfkill bcm2835_v4l2(C) bcm2835_codec(C) bcm2835_isp(C) snd_soc_core v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops snd_compress videobuf2_v4l2 snd_pcm_dmaengine videobuf2_common snd_bcm2835(C) videodev snd_pcm mc vc_sm_cma(C) snd_timer snd syscopyarea rpivid_mem sysfillrect sysimgblt fb_sys_fops backlight nvmem_rmem uio_pdrv_genirq uio i2c_dev ip_tables x_tables ipv6 Feb 14 15:06:55 allsky kernel: [ 281.908335] CPU: 0 PID: 131 Comm: kworker/0:2 Tainted: G C 5.10.11-v7l+ #1399 Feb 14 15:06:55 allsky kernel: [ 281.908343] Hardware name: BCM2711 Feb 14 15:06:55 allsky kernel: [ 281.908364] Workqueue: events dbs_work_handler

I'm not an expert in Linux environment, but from my research and trace (you can use sudo strace -r -p capture-RpiHQ-PID ) looks like that the raspistill call never returned. The problem is to discover why. I have as well a fully upgraded system, running relatively cold (max temp is 52C) with power from an official Rpi4 power supply. There are other people running a RPi4 with HQ camera without problems? BTW: why the raspistill command is run with "nice" without option, so I suppose a "niceness" of 10? Why downgrade the niceness?

paolobar54 commented 3 years ago

Anyways I did a test without the "nice" just for fun and no change: freeze after 45 minutes of capture...

f29pc commented 3 years ago

I bought 2 new sd cards and did a fresh install on both, one a 64g and the other a 128g. Both failed to capture after running about 2 hrs on the RPi4 (also new). I put the same (64g) sd card in an older RPi3B that I had, and it ran fine all night. I also have it running ok on an older RPi4 . I'm pretty sure that when I first installed on the new RPi4 that I ran an update and an upgrade after installing Rasbain. (didn't on the older Pi4 that is running ok). The Pi4 that it runs ok on is v1.1, the new pi4 it fails on is v1.2. Not sure what what it means but I hope the info helps..

Jonk2 commented 3 years ago

I might be wildy off here, but there was an issue in the latest Rasbian update I believe, that caused a POE hat's fan to stop working. If you don't have one of these, then there may be another issue that is causing throttling / too much CPU work. Perhaps try 1 version back and see what happens? I've not had any issues with my allsky crashing, new Rpi 4 and M.2 running via USB3. It may also be camera driver related - I'm using a ZWO camera without any issues.

f29pc commented 3 years ago

Update.. On the same install that the PiHQ was failing, I removed the PiHQ camera and replaced it with my ZWO120mc and it has been running all day with no issues. So it appears to be a problem only with the PiHQ camera.

paolobar54 commented 3 years ago

OK, I'm confused (OK, I'm MORE confused). I've my system (RPi4+HQ) now running since yesterday evening, 18 hours so far a real world record. I obtained that simply closing and deleting all the Chrome tabs that were attached to the GUI interface. I just access the RPi using VNC or SSH sporadically just to check that is still running... I don't know if that has any statistical significance or not, maybe if somebody want to try... In the meantime that I continue the experiment, I'm setting up an old PC to receive the files with FTP so avoiding to "disturb" the RPi.

matkovic commented 3 years ago

Same problem here:

/var/log/allsky.log

Feb 15 20:57:49 allsky kernel: [43640.096551] ------------[ cut here ]------------ Feb 15 20:57:49 allsky kernel: [43640.096581] WARNING: CPU: 3 PID: 28009 at drivers/firmware/raspberrypi.c:64 rpi_firmware_tr ansaction+0xec/0x128 Feb 15 20:57:49 allsky kernel: [43640.096590] Firmware transaction timeout Feb 15 20:57:49 allsky kernel: [43640.096599] Modules linked in: cmac rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc f use joydev uinput 8021q garp stp llc brcmfmac brcmutil sha256_generic v3d raspberrypi_hwmon gpu_sched cfg80211 bcm2835_codec( C) bcm2835_v4l2(C) rfkill v4l2_mem2mem vc4 videobuf2_vmalloc bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig cec vi deobuf2_memops videobuf2_v4l2 videobuf2_common drm_kms_helper drm snd_bcm2835(C) videodev drm_panel_orientationquirks mc vc sm_cma(C) snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops bac klight rpivid_mem uio_pdrv_genirq uio nvmem_rmem i2c_dev ip_tables x_tables ipv6 Feb 15 20:57:49 allsky kernel: [43640.097130] CPU: 3 PID: 28009 Comm: kworker/3:2 Tainted: G C 5.10.14-v7l+ #1 401 Feb 15 20:57:49 allsky kernel: [43640.097136] Hardware name: BCM2711 Feb 15 20:57:49 allsky kernel: [43640.097149] Workqueue: events dbs_work_handler Feb 15 20:57:49 allsky kernel: [43640.097159] Backtrace: Feb 15 20:57:49 allsky kernel: [43640.097179] [] (dump_backtrace) from [] (show_stack+0x20/0x24) Feb 15 20:57:49 allsky kernel: [43640.097189] r7:ffffffff r6:00000000 r5:60000013 r4:c12e69fc Feb 15 20:57:49 allsky kernel: [43640.097199] [] (show_stack) from [] (dump_stack+0xcc/0xf8) Feb 15 20:57:49 allsky kernel: [43640.097211] [] (dump_stack) from [] (warn+0xfc/0x114) Feb 15 20:57:49 allsky kernel: [43640.097221] r10:dec01008 r9:00000009 r8:c099ae6c r7:00000040 r6:00000009 r5:c099ae6c Feb 15 20:57:49 allsky kernel: [43640.097227] r4:c0e9a114 r3:c1205094 Feb 15 20:57:49 allsky kernel: [43640.097238] [] (warn) from [] (warn_slowpath_fmt+0xa4/0xd8) Feb 15 20:57:49 allsky kernel: [43640.097246] r7:00000040 r6:c0e9a114 r5:c1205048 r4:c0e9a134 Feb 15 20:57:49 allsky kernel: [43640.097257] [] (warn_slowpath_fmt) from [] (rpi_firmware_transaction+0xec/0x128) Feb 15 20:57:49 allsky kernel: [43640.097266] r9:c1a7a340 r8:00000018 r7:00000000 r6:ffffff92 r5:c1a7a340 r4:c1205048 Feb 15 20:57:49 allsky kernel: [43640.097277] [] (rpi_firmware_transaction) from [] (rpi_firmware_property_list+0xbc/0x170) Feb 15 20:57:49 allsky kernel: [43640.097285] r7:c1205048 r6:dec01000 r5:00001000 r4:dec01024 Feb 15 20:57:49 allsky kernel: [43640.097297] [] (rpi_firmware_property_list) from [] (rpi_firmware_property+0x70/0x118) Feb 15 20:57:49 allsky kernel: [43640.097306] r10:c6d6e08c r9:00030002 r8:00000018 r7:c1a7a340 r6:c8d55d48 r5:0000000c Feb 15 20:57:49 allsky kernel: [43640.097312] r4:c6d6e080 Feb 15 20:57:49 allsky kernel: [43640.097324] [] (rpi_firmware_property) from [] (raspberrypi_clock_property+0x54/0x7c) Feb 15 20:57:49 allsky kernel: [43640.097332] r10:00000000 r9:00000000 r8:c1abf780 r7:00000000 r6:3b9aca00 r5:c8d55d70 Feb 15 20:57:49 allsky kernel: [43640.097345] r4:c1205048 r3:0000000c ...

~/allsky/log.txt (just before it freezes)

Saving image-20210215205637.jpg

Saving image-20210215205714.jpg

-angle -8 -autofocus 1 -autogain 0 -awb 0 -background 0 -bin 1 -brightness 50 -darkframe 0 -daytimeDelay 15000 -delay 10 -exposure 30000 -filename image.jpg -flip 2 -fontcolor 255 -fontsize 50 -gain 8 -gamma 50 -height 0 -latitude 0.0N -longitude 0.0E -quality 100 -rotation 0 -showDetails 0 -text rem -time 1 -wbb 2.0 -wbr 2.8 -width 0 -daytime 1

f29pc commented 3 years ago

I think Jonk2 might be on to something.. Not a solution but a workaround.... I did a fresh install of raspbian 2021-01-11. On the first boot, this time I skipped the update option (I ran the update on the system that would fail) then installed allsky and the gui. So far using the same hardware, it has been up and running for over 5 hrs. Before it would repeatedly stop capturing after 2 to 2.5 hrs.

paolobar54 commented 3 years ago

Call me a chicken, but I give up... I ordered a RPi3B+ and finally run an entire night of capture. Now in the fight against timelapse that are generated and not playing and installing a web server... but those are other stories (and my conviction is that the problem is in the some race condition inside the raspistill side software not hardware, or environment)

pclanon commented 3 years ago

Data point: I’ve had two straight full nights without a crash and without reverting the OS to an earlier version (or making any other change). Assuming I haven’t just jinxed it, I’ll keep running the RPi 4B-4GB updated and upgraded and report back here. And no, timelapse and startrails don’t work reliably for me either. I just added my own ffmpeg step to the "additional steps to run at end of night" script and that works fine for timelapses.

bleara commented 3 years ago

I get exactly the same problem with a very similar backtrace to Matkovic, starting with this:- WARNING: CPU: 2 PID: 20428 at drivers/firmware/raspberrypi.c:63 rpi_firmware_transaction+0xec/0x128 I only started with allsky at the beginning of the month and I am adding support for a Veye imx327 camera which uses a similar but customised raspistill() call. It all worked brilliantly to start with, running several nights but I must have done an apt-get update and now it runs for about 3 1/2 hours before the process errors. I also get the 50% cpu load on the web page but it seems to be an incorrect reading. I first had the problem with the January Buster release so I reverted to December and did an update on that and I still get the error. I need to revert to December and be careful to be with no updates to try that. Then also Jan with no updates too like f29pc has done.

bleara commented 3 years ago

The RPI December image and no explicit update with a new install of allsky also now fails for me so I don't know which part causes the problem. I did not install the Gui this time as I wanted a minimal install. Bizarrely on both of the last two nights the failure has occurred at 21:51 GMT so the last two files are image-20210219215117.jpg and image-20210220215155.jpg i.e 24 Hrs and 38 seconds apart. A coincidence?? Unfortunately I don't possess an HQ camera so can't test with the standard platform.

pclanon commented 3 years ago

I reverted back to December 23 firmware, and two nights now of no crash.

lumdiniz commented 3 years ago

Estou a 3 dias lutando com a nova instalação pois meu cartão queimou tive que instalar novamente, estou com mesmo problema de congelar imagem . Alguma solução?

pclanon commented 3 years ago

lumdiniz, I reverted the operating system back to a December 23 version, and the problem hasn't come back for me. Here's how I went back:

sudo rpi-update 611beaaa346c8c2b285d816ed796f0fe6daf2417

Obviously, don't update or upgrade after reverting.

lumdiniz commented 3 years ago

lumdiniz, I reverted the operating system back to a December 23 version, and the problem hasn't come back for me. Here's how I went back:

sudo rpi-update 611beaaa346c8c2b285d816ed796f0fe6daf2417

Obviously, don't update or upgrade after reverting.

lumdiniz, I reverted the operating system back to a December 23 version, and the problem hasn't come back for me. Here's how I went back:

sudo rpi-update 611beaaa346c8c2b285d816ed796f0fe6daf2417

Obviously, don't update or upgrade after reverting.

ok obrigado, minha camera funcionou 1 ano agora queimou o cartao SD, nao tinha backup estou fazendo do zero, estou com dificuldades de ativar o gps sunwait voce tem dicas? a imagem nao esta indo para o site inteira e esta sem data e hora. grato

IanLauwerys commented 3 years ago

Possibly resolved.

I've been experiencing this issue since December on a brand new Pi 4 and HQ Cam with Raspian Buster (fully updated straight after install).

Hopefully recent Pi firmware updates have resolved it. The Pi would run for a random number hours (usually between three and twelve) and then stop capturing. The final capture would usually have some corruption, either covered in pink stripes or red/pink pixel 'noise' all over the darkest areas of the image. Sometimes the Pi would remain responsive and generate the startrails, timelapse, etc. at the end of the night (with a whatever images it had before the camera hung), and it would be possible to remote desktop to it. Other times it would completely hang and require a power-cycle.

On the occasions where it remained responsive, restarting with sudo shutdown -r now would cause the Pi to hang and require a power-cycle. Instead forcing a hard reboot as below would kick it back in to life (after a delay presumably because of the unclean shutdown).

echo s | sudo tee /proc/sysrq-trigger
echo u | sudo tee /proc/sysrq-trigger
echo b | sudo tee /proc/sysrq-trigger

There are a few threads that seem to suggest it is a firmware problem that crept in during a December 2020 update, though the WARNING: CPU: 2 PID: 20428 at drivers/firmware/raspberrypi.c:63 rpi_firmware_transaction+0xec/0x128 error is fairly generic from what I understand:

https://github.com/raspberrypi/linux/issues/4047 https://github.com/raspberrypi/linux/issues/4033 https://github.com/raspberrypi/firmware/issues/1552

I understand the issue may have been fixed in a Pi firmware update some time in March. I tried updating the Pi with:

sudo apt update
sudo apt full-upgrade

I believe this should update the firmware, but after restarting the Pi and checking with:

/opt/vs/bin/vcgencmd version

I was still getting a firmware version dated 25th February. I then tried:

sudo rpi-update
sudo shutdown -r now

This forces a firmware update but I don't think it is recommended unless there is a specific reason to do so. Checking the firmware version again:

/opt/vs/bin/vcgencmd version

Gives:

Apr 21 2021 15:48:42 
Copyright (c) 2012 Broadcom
version a48d332c35ee1c1c1ab433228e23317f62dcc5fb (clean) (release) (start_x)

I've been up and running for two days now with no hangs or corrupted images. I do have a cron job that restarts the Pi every 24 hours during the daytime, but hopefully this is the fix as I haven't had a solid 24 hour run for months.

jcauthen78 commented 3 years ago

I was having a similar problem with a fresh Pi 4b, 8gb and the PiHQ cam - lots of lock-ups, sudo reboots would hang and be unresponsive on the reboot forcing a hard power cycle.. got up to about 3 a day, and some nights wouldn't complete because of it. I ended up creating a .sh script to check the age of the live-view.jpg image, and if its older than 2 min, to try and reboot. Also put in a script to email me when it reboots

Followed the firmware update you mentioned earlier today, and it's been decently stable so far, crossing my fingers for a smooth run tonight. we'll see if i get any reboot emails in the morning.

sbkirby commented 3 years ago

I was experiencing the same problem of 'hangs' as described above after updating my RPi. Prior to the fix, the firmware installed was dated Feb 2021. After following the instructions posted by @IanLauwerys, my RPi 4 is working fine...No hangs.

CuriousDran commented 3 years ago

Hello, I seem to be having the same issue as everyone in these threads. Running 4b with HQ camera, it is intermittently freezing during raspistill calls. using the --verbose output, I see a line stating

"raspistill" Camera App (commit 4a0a19b88b43 Tainted)

I've tried as @IanLauwerys said with rpi-update. It froze after about 3 hours, which is an improvement, but I am still getting the commit error I quoted above.

CuriousDran commented 3 years ago

Link to another comment I made with more information in post #1152 https://github.com/raspberrypi/firmware/issues/1552#issuecomment-873247512

EricClaeys commented 3 years ago

Closing issue based on multiple people in this thread and others saying the most recent firmware fixed the problem of the RPiHQ camera stopping after a few hours. This issue wasn't related to the Allsky software.