gilestrolab / ethoscope

a platform from monitoring animal behaviour in real time from a raspberry pi
http://lab.gilest.ro/ethoscope/
GNU General Public License v3.0
17 stars 25 forks source link

Ethoscopes stop tracking, could not get frame from camera #85

Closed pepelisu closed 4 years ago

pepelisu commented 6 years ago
Sorry there has been an error: Traceback (most recent call last): File "/opt/ethoscope-git/src/ethoscope/web_utils/control_thread.py", line 416, in run hardware_connection, StimulatorClass, stimulator_kwargs) File "/opt/ethoscope-git/src/ethoscope/web_utils/control_thread.py", line 306, in _start_tracking self._monit.run(result_writer, self._drawer) File "/opt/ethoscope-git/src/ethoscope/core/monitor.py", line 134, in run raise e EthoscopeException: 'Could not get frame from camera\n%s' 

After restarting they seems to work normally. However, some of the ethoscopes that has stopped after restarting they become "not in use" after a while.

pepelisu commented 6 years ago

Complete journalctl error:

Jan 17 22:23:00 e005 python2[348]: IOError: [Errno 32] Broken pipe Jan 17 22:23:00 e005 python2[348]: send(obj) Jan 17 22:23:00 e005 python2[348]: File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed Jan 17 22:23:00 e005 python2[348]: Traceback (most recent call last): Jan 17 22:23:00 e005 python2[348]: EthoscopeException: 'Could not get frame from camera\n%s' Jan 17 22:23:00 e005 python2[348]: raise e Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/core/monitor.py", line 134, in run Jan 17 22:23:00 e005 python2[348]: self._monit.run(result_writer, self._drawer) Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/web_utils/control_thread.py", line 306, in _start_tracking Jan 17 22:23:00 e005 python2[348]: hardware_connection, StimulatorClass, stimulator_kwargs) Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/web_utils/control_thread.py", line 416, in run Jan 17 22:23:00 e005 python2[348]: ERROR:root:Traceback (most recent call last): Jan 17 22:23:00 e005 python2[348]: ERROR:root:Monitor closed with an error: Jan 17 22:23:00 e005 python2[348]: ERROR:root:Failed to run mysql command: Jan 17 22:23:00 e005 python2[348]: ' Jan 17 22:23:00 e005 python2[348]: EthoscopeException: 'Could not get frame from camera\n%s' Jan 17 22:23:00 e005 python2[348]: raise EthoscopeException("Could not get frame from camera\n%s", traceback.format_exc(e)) Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/hardware/input/cameras.py", line 492, in _next_image Jan 17 22:23:00 e005 python2[348]: im = self._next_image() Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/hardware/input/cameras.py", line 97, in _next_time_image Jan 17 22:23:00 e005 python2[348]: t,out = self._next_time_image() Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/hardware/input/cameras.py", line 58, in iter Jan 17 22:23:00 e005 python2[348]: for i,(t, frame) in enumerate(self._camera): Jan 17 22:23:00 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/core/monitor.py", line 101, in run Jan 17 22:23:00 e005 python2[348]: ERROR:root:Monitor closing with an exception: 'Traceback (most recent call last): Jan 17 22:22:29 e005 python2[348]: WARNING:root:DAM file writer skipping a tick. No data for more than one period! Jan 17 22:22:29 e005 python2[348]: PiCameraValueError: Incorrect buffer length for resolution 1280x960 Jan 17 22:22:29 e005 python2[348]: 'Incorrect buffer length for resolution %dx%d' % (width, height)) Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/array.py", line 127, in bytes_to_rgb Jan 17 22:22:29 e005 python2[348]: self.array = bytes_to_rgb(self.getvalue(), self.size or self.camera.resolution) Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/array.py", line 238, in flush Jan 17 22:22:29 e005 python2[348]: stream.flush() Jan 17 22:22:29 e005 python2[348]: WARNING:root:FG model not updated for too long. Resetting. Jan 17 22:22:29 e005 python2[348]: WARNING:root:FG model not updated for too long. Resetting. Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/mmalobj.py", line 371, in close_stream Jan 17 22:22:29 e005 python2[348]: mo.close_stream(output, opened) Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/encoders.py", line 349, in _close_output Jan 17 22:22:29 e005 python2[348]: self._close_output() Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/encoders.py", line 419, in stop Jan 17 22:22:29 e005 python2[348]: self.stop() Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/encoders.py", line 431, in close Jan 17 22:22:29 e005 python2[348]: encoder.close() Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/site-packages/picamera/camera.py", line 1710, in capture_continuous Jan 17 22:22:29 e005 python2[348]: for frame in capture.capture_continuous(raw_capture, format="bgr", use_video_port=True): Jan 17 22:22:29 e005 python2[348]: File "/opt/ethoscope-git/src/ethoscope/hardware/input/cameras.py", line 360, in run Jan 17 22:22:29 e005 python2[348]: self.run() Jan 17 22:22:29 e005 python2[348]: File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap Jan 17 22:22:29 e005 python2[348]: Traceback (most recent call last): Jan 17 22:22:29 e005 python2[348]: Process PiFrameGrabber-1: Jan 17 22:22:29 e005 python2[348]: WARNING:root:Camera Frame grabber stopped acquisition cleanly Jan 17 22:22:29 e005 python2[348]: WARNING:root:Closing frame grabber process Jan 17 22:22:29 e005 ntpd[347]: receive: Unexpected origin timestamp 0xde0a4879.8f11f175 does not match aorg 0000000000.00000000 from server@192.168.123.3 xmt 0xde0a4b25.5c7b358c Jan 17 22:22:29 e005 systemd[1]: Time has been changed -- Reboot -- Jan 17 22:18:43 e005 systemd-journald[226]: Journal stopped Jan 17 22:18:43 e005 wpa_actiond[389]: Terminating wpa_actiond session for interface 'wlan0' Jan 17 22:18:42 e005 dhcpcd[577]: dhcpcd exited Jan 17 22:18:42 e005 dhcpcd[577]: script_runreason: /usr/lib/dhcpcd/dhcpcd-run-hooks: Terminated Jan 17 22:18:42 e005 ntpd[397]: 127.127.1.0 local addr 127.0.0.1 -> Jan 17 22:18:42 e005 ntpd[397]: 192.168.123.4 local addr 192.168.123.12 -> Jan 17 22:18:42 e005 ntpd[397]: 192.168.123.3 local addr 192.168.123.12 -> Jan 17 22:18:42 e005 ntpd[397]: 192.168.123.2 local addr 192.168.123.12 -> Jan 17 22:18:42 e005 ntpd[397]: 192.168.123.1 local addr 192.168.123.12 -> Jan 17 22:18:42 e005 ntpd[397]: ntpd exiting on signal 15 (Terminated) Jan 17 22:18:41 e005 sshd[393]: Received signal 15; terminating. Jan 17 22:18:41 e005 dhcpcd[577]: wlan0: removing interface Jan 17 22:18:42 e005 systemd[1]: Detected architecture arm. Jan 17 22:18:41 e005 dhcpcd[577]: received SIGTERM, stopping Jan 17 22:18:42 e005 systemd[1]: systemd 232 running in system mode. (+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECC Jan 17 22:18:41 e005 wpa_actiond[389]: Interface 'wlan0' disconnected from network 'ETHOSCOPE_WIFI' Jan 17 22:18:42 e005 systemd[1]: Reexecuting. Jan 17 22:18:31 e005 watchdog[305]: /usr/lib/sendmail does not exist or is not executable (errno = 2) Jan 17 22:18:42 e005 systemd[1]: logrotate.service: Failed with result 'signal'. Jan 17 22:18:31 e005 watchdog[305]: shutting down the system because of error 255 = 'unconditional reboot requested' Jan 17 22:18:42 e005 systemd[1]: logrotate.service: Unit entered failed state. Jan 17 22:18:31 e005 watchdog[305]: repair child 18387 timed out Jan 17 22:18:42 e005 systemd[1]: Failed to start Rotate log files. Jan 17 22:17:31 e005 watchdog[305]: Retry timed-out at 61 seconds for /etc/watchdog.d/write_test.sh Jan 17 22:18:42 e005 systemd[1]: logrotate.service: Main process exited, code=killed, status=15/TERM Jan 17 22:17:31 e005 watchdog[305]: test-binary /etc/watchdog.d/write_test.sh exceeded time limit 60 Jan 17 22:18:43 e005 ifplugd[299]: Exiting. Jan 17 22:18:43 e005 ifplugd[299]: Program execution failed, return value is 0. Jan 17 22:18:43 e005 ifplugd[299]: Killing child. Jan 17 22:18:43 e005 ifplugd[299]: Executing '/etc/ifplugd/netctl.action wlan0 down'. Jan 17 22:18:43 e005 mysqld[487]: 2018-01-17 22:18:42 1657194400 [Note] Event Scheduler: Purging the queue. 0 events Jan 17 22:18:43 e005 mysqld[487]: 2018-01-17 22:18:42 1657194400 [Note] /usr/bin/mysqld: Normal shutdown Jan 17 22:18:43 e005 ifplugd[300]: Exiting. Jan 17 22:18:43 e005 haveged[295]: haveged starting up Jan 17 22:18:43 e005 haveged[295]: haveged: Stopping due to signal 15 Jan 17 22:18:42 e005 systemd[1]: Received SIGTERM from PID 305 (watchdog). Jan 17 22:16:30 e005 watchdog[305]: test-binary /etc/watchdog.d/write_test.sh exceeded time limit 60 Jan 17 22:18:43 e005 kernel: systemd: 45 output lines suppressed due to ratelimiting Jan 17 22:18:43 e005 kernel: [<8014956c>] (kthread) from [<801078e8>] (ret_from_fork+0x14/0x2c) Jan 17 22:18:43 e005 kernel: [<801445b8>] (worker_thread) from [<8014956c>] (kthread+0xf4/0x118) Jan 17 22:18:43 e005 kernel: [<80143770>] (process_one_work) from [<801445b8>] (worker_thread+0x3c/0x588) Jan 17 22:18:43 e005 kernel: [<80868c34>] (mmc_rescan) from [<80143770>] (process_one_work+0x210/0x500) Jan 17 22:18:43 e005 kernel: [<8086e0e8>] (mmc_sd_detect) from [<80868c34>] (mmc_rescan+0x1ac/0x378) Jan 17 22:18:43 e005 kernel: [<80866094>] (mmc_claim_host) from [<8086e0e8>] (mmc_sd_detect+0x1c/0x70) Jan 17 22:18:42 e005 kernel: [<809c0dac>] (schedule) from [<80866094>] (mmc_claim_host+0xb0/0x1e0) Jan 17 22:18:42 e005 kernel: [<809c07e8>] (__schedule) from [<809c0dac>] (schedule+0x48/0xa0) Jan 17 22:18:42 e005 kernel: Workqueue: events_freezable mmc_rescan Jan 17 22:18:42 e005 kernel: kworker/3:0 D 0 17098 2 0x00000000 Jan 17 22:18:42 e005 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 17 22:18:12 e005 kernel: Not tainted 4.9.26-1-ARCH #1 Jan 17 22:17:08 e005 kernel: INFO: task kworker/3:0:17098 blocked for more than 120 seconds. Jan 17 22:15:44 e005 kernel: TCP: request_sock_TCP: Possible SYN flooding on port 9000. Sending cookies. Check SNMP counters. Jan 17 22:14:14 e005 python2[399]: WARNING:root:DAM file writer skipping a tick. No data for more than one period! Jan 17 22:14:10 e005 python2[399]: WARNING:root:FG model not updated for too long. Resetting. Jan 17 22:11:15 e005 watchdog[305]: test-binary /etc/watchdog.d/write_test.sh exceeded time limit 60 -- Reboot --

pepelisu commented 6 years ago

Maybe error is originated to Wlan connexion lost?

pepelisu commented 6 years ago

Further investigation points to logrotate like the culprit. It somehow interferes with the watchdog log and/or requires a lot of RAM memory and during tracking the machine this causes a hang. In the best of the situations the process is killed by the operating system. Disabling logrotate.service and logrotate.timer seems to help to prevent this hangs. However, when the machine is rebooted logrotate.timer is started again. We should find the problem with logrotate, or reduce the journald file size, or stop logrotate to automatically start with every boot.

pepelisu commented 6 years ago

Another update to this, and most likely related:

Jun 07 17:24:54 e005 python2[352]: ERROR:root:Could not get any frame from the camera
Jun 07 17:24:54 e005 python2[352]: WARNING:root:Stopping stop queue
Jun 07 17:24:54 e005 python2[352]: WARNING:root:Stopping queue
Jun 07 17:24:54 e005 python2[352]: WARNING:root:Joining process
Jun 07 17:25:02 e005 python2[352]: WARNING:root:<picamera.camera.PiCamera object at 0x65603e68>
Jun 07 17:25:03 e005 python2[352]: /usr/lib/python2.7/site-packages/picamera/array.py:175: PiCameraDeprecated: This method changes the position of the stream to the truncated length; this is deprecated functionality and you should not rely on it (seek before or after truncate to ensure position is consistent)
Jun 07 17:25:04 e005 python2[352]: WARNING:root:Process joined
Jun 07 17:25:03 e005 python2[352]:   'This method changes the position of the stream to the '
Jun 07 17:25:04 e005 python2[352]: ERROR:root:Could not load previous state for unexpected reason:
Jun 07 17:25:04 e005 python2[352]: ERROR:root:Monitor closed with an error:
Jun 07 17:25:04 e005 python2[352]: ERROR:root:Traceback (most recent call last):
Jun 07 17:25:04 e005 python2[352]:   File "/opt/ethoscope-git/src/ethoscope/web_utils/control_thread.py", line 408, in run
Jun 07 17:25:04 e005 python2[352]:     raise e
Jun 07 17:25:04 e005 python2[352]: Empty
Jun 07 17:25:20 e005 mysqld[438]: 2018-06-07 17:25:20 1988993840 [ERROR] Error in Log_event::read_log_event(): 'Event truncated', data_len: 825505847, event_type: 40
Jun 07 17:25:20 e005 mysqld[438]: 2018-06-07 17:25:20 1988993840 [Note] Starting crash recovery...
Jun 07 17:25:22 e005 mysqld[438]: 2018-06-07 17:25:20 768da330  InnoDB: Starting recovery for XA transactions...
Jun 07 17:25:22 e005 mysqld[438]: 2018-06-07 17:25:20 768da330  InnoDB: Transaction 23348840 in prepared state after recovery
ggilestro commented 4 years ago

Fixed by c2e8a7f656611cc10379c8e93ff4205220c8807a