Closed 0KiB closed 4 years ago
what is the output from:
systemctl status influxdb
/bin/df
grep -v \# /etc/influxdb/influxdb.conf |uniq
sudo /usr/bin/du -d 2 /var/lib/influxdb |sort -n
influx -precision rfc3339 -database ruuvi -execute "select last(temperature),mac from ruuvi_measurements
curl --get 'http://localhost:8086/query?pretty=true' --data-urlencode "db=ruuvi" --data-urlencode "q=SELECT last(rssi),mac FROM ruuvi_measurements "
Hey, theres log.
influx -precision rfc3339 -database ruuvi -execute "select last(temperature),mac from ruuvi_measurements
^ Gave nothing i had to control a+d
Sorry, I seem to have left out the trailing quote ( " ) afterruuvi_measurements
. control C would be better when a bash command like this seems to be hung. No worries.
Your responses:
systemctl status influxdb
influxdb.service - InfluxDB is an open-source, distributed, time series database
Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-06-19 16:58:07 EEST; 50min ago
Docs: https://docs.influxdata.com/influxdb/
Main PID: 473 (influxd)
CGroup: /system.slice/influxdb.service
└─473 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
Jun 19 17:46:58 raspberrypi influxd[473]: [httpd] 192.168.1.212, 192.168.1.212,::1 - - [19/Jun/2019:17:46:58 +0300] "GET /query?db=r
+++ Unfortunately above is an incomplete (even after scrolling right here) due to the fact that systemctl cuts log to fit size of the terminal window!+++
/bin/df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 60910488 5650140 52714540 10% /
grep -v \# /etc/influxdb/influxdb.conf |uniq
[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
series-id-set-cache-size = 100
sudo /usr/bin/du -d 2 /var/lib/influxdb |sort -n
31044 /var/lib/influxdb/wal/_internal
31060 /var/lib/influxdb/wal
36616 /var/lib/influxdb
curl --get 'http://localhost:8086/query?pretty=true' --data-urlencode "db=ruuvi" --data-urlencode "q=SELECT last(rssi),mac FROM ruuvi_measurements "
{ "results": [ { "statement_id": 0,
"series": [ { "name": "ruuvi_measurements",
"columns": [ "time", "last", "mac" ],
"values": [ [ "2019-06-19T14:26:37.437536735Z", -91, "DFE5676531E9"
This all shows that the influx data base is running and responding to updates from ruuvi collector..
How often does this happen? predictably?
I have had problems with the entire system stalling for a couple of minutes on a raspberry Pi zero w.
To determine if this is memory/swapping/paging problem we need to start recording statics. Add this SINGLE line to
/etc/cron.d/atsar
*/10 * * * * root test -x /usr/lib/atsar/atsa1 && (LOGDIR=/var/log/atsar; CURDAY=`date +\%d`;/usr/lib/atsar/atsadc 15 39 $LOGDIR/atsa$CURDAY )
Be sure to scroll to the right to get all the command ending with CURDAY)
Be sure the directory /var/log/atsar exits, if it doesn't then
sudo mkdir /var/log/atsar
Next time the error occurs we will use a command like
atsar -s 12:25 -e 12:40 -p
to show statistics.
Feels like linux or InfluxDB issue.
Heres logs from atsar / others. So grafana will stop showing temps after about 2hours, last day i was able to run it for a good 8h seems like it will randomly crash or something. I also re-flashed rasbian and installed everything again but not helping at all.
While keeping on eye grafana Grafana popups red notices on right corner saying unauthorized but works still after that, if not totally messed-up by something. (not sure why)
systemctl status shows complete information this time but not from a time that there was a problem!
The atsar command -s (start ) and -e (end) needs to include a period of a few minutes before the event and a few minutes after the problem event.
What is output from
grep --invert-match '^#' /home/pi/RuuviCollector/ruuvi-collector.properties |uniq
influx -precision rfc3339 -database ruuvi -execute "select mac,last(temperature) from ruuvi_measurements group by mac;"|grep 2019
atsar included start / end just like you mentioned.
pi@raspberrypi:~ $ influx -precision rfc3339 -database ruuvi -execute "select mac,last(temperature) from ruuvi_measurements group by mac;"|grep 2019 2019-06-23T13:18:17.620637744Z 0052010C4003 24.3 2019-06-23T17:43:54.173533189Z DFE5676531E9 19.03 2019-06-23T17:43:51.950908515Z F161992C25B6 24.16 2019-06-23T17:43:49.227313382Z FD0531F180CB 23.83
grep --invert-match '^#' /home/pi/ruuvi-collector.properties |uniq ^ nothing...
I do not see atsar data covering a period of an event?
OK so you've got 4 active tags no biggie. You haven't specified any custom ruuvi-collector.properties
influx SHOW RETENTION POLICIES on ruuvi quit
I also have the same issue! I have 3 ruuvi tags and running ruuvi collector from a RPI Zero W...I'm getting a lot of [no data] alerts in Grafana and the system crashes within a few days. Anything i can generate to help?
@thestraycat How often does this happen? predictably? Please be more specific regarding "crashes". Can you provide syslog messages (/var/log )? systemctl status ruuvicollector How does it (you? ) recover? +++++++++++++++ To determine if this is memory/swapping/paging problem we need to start recording statics. To set up to collect 15 second samples 39 times every 10 minutes add this SINGLE line to /etc/cron.d/atsar
*/10 * * * * root (LOGDIR=/var/log/atsar; CURDAY=
date +\%d;/usr/lib/atsar/atsadc 15 39 $LOGDIR/atsa$CURDAY )
Be sure to scroll to the right to get all the command ending with CURDAY)
Be sure the directory /var/log/atsar exits, if it doesn't then
sudo mkdir /var/log/atsar
Next time the error occurs we will use a command SOMETHING LIKE
atsar -s 12:25 -e 12:40 -p
(where -s (start and -e (end) cover the failing period) to show statistics.
+++++++++++++++
What is output from
grep --invert-match '^#' /home/pi/RuuviCollector/ruuvi-collector.properties |uniq
+++++++++++++
top -d 10 |head
I have the same issue, fresh install on rpi4, ruuvi collector was fine for a couple of days but now it crashes after ~5h of use, there are no errors, but if I restart the collector it continues to work. I'm wondering does it have something to do with the bluez utils, mine's version 5.5
Are you running the latest version of the collector, and does it print anything in the log when it crashes?
I have now set up a 2nd Raspberry pi with the latest Raspbian (10, Buster) to see if I can reproduce this somehow, but without any additional details it's really difficult to pinpoint whether the issue is with the collector or something else
Are you running the latest version of the collector, and does it print anything in the log when it crashes?
Yes, latest version by cloning. To be more specific I don't think it technically crashes, it just stops writing the data. At some point, I see that there's no new data coming into Grafana so if I check the status of the RuuviCollector it seems to be running fine, no errors or anything. When I restart the service the data starts to flow in.
I'll keep it running and see if it produces any errors
is there some other logs I could check except the service logs from
journalctl -u RuuviCollector
and systemctl status ruuvi-collector.service
RuuviCollector logs everything to the standard output, which can be seen with journalctl if running the collector as a systemd service. The collector should log all anomalies that prevent data from being saved as well as any exits (clean or not), so if nothing appears in the logs, it sounds like it's being stuck somewhere, could be that hcidump stops outputting data (without exiting) or something else.
if/when this happens again and the collector seems "stuck", could you try taking a thread dump and posting it here?
You can take a thread dump by first finding the PID of the process (for example, ps -ef | grep java
the pid is in the 2nd column, after the username), then issue kill -3 PID
(replace the PID with the PID of the java process) and then Java should print the thread dump to the standard output, which should appear in journalctl -u RuuviCollector
in your case.
After the three "crashes" in a row during one day, the collector has been running happily for over a month without any problems. Last Friday it didn't get stuck but actually crashed and did close the service so I can't take the thread dump, However, There's a clear log this time.
The log states that there were some issues with the Bluetooth device, just restarting the service gets it running again.
Given the timeframe this can happen It seems to me that this could be very hard to debug, but I'm pretty sure both problems are related to the Bluetooth device in some way
What would be the recommended way to just restart the service when something like this occurs?
collector.service - RuuviCollector
Loaded: loaded (/etc/systemd/system/ruuvi-collector.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-10-10 17:31:26 +08; 3 days ago
Process: 307 ExecStart=/usr/bin/java -jar /home/pi/git/ruuvi-collector/target/ruuvi-collector-0.2.jar (code=exited, status=1/FAILURE)
Main PID: 307 (code=exited, status=1/FAILURE)
Oct 10 17:31:15 kosong systemd[1]: Started RuuviCollector.
Oct 10 17:31:21 kosong RuuviCollector[307]: 2019-10-10 17:31:21.058 DEBUG [Main] Starting scan with: [hcitool, lescan, --duplicates, --passive]
Oct 10 17:31:21 kosong RuuviCollector[307]: 2019-10-10 17:31:21.089 DEBUG [Main] Starting dump with: [hcidump, --raw]
Oct 10 17:31:21 kosong RuuviCollector[307]: 2019-10-10 17:31:21.111 INFO [Main] BLE listener started successfully, waiting for data...
Oct 10 17:31:21 kosong RuuviCollector[307]: If you don't get any data, check that you are able to run 'hcitool lescan' and 'hcidump --raw' without issues
Oct 10 17:31:26 kosong RuuviCollector[307]: 2019-10-10 17:31:26.415 ERROR [Main] Can't attach to device hci0. No such device(19): Check that your bluetooth adapter is enabled and working properly
Oct 10 17:31:26 kosong RuuviCollector[307]: 2019-10-10 17:31:26.495 INFO [Main] Unclean exit
Oct 10 17:31:26 kosong systemd[1]: ruuvi-collector.service: Main process exited, code=exited, status=1/FAILURE
Oct 10 17:31:26 kosong systemd[1]: ruuvi-collector.service: Failed with result 'exit-code'.
To the file: /lib/systemd/system/ruuvicollector.service please add
After=network.target Restart=always RestartSec=300
The RestartSec delays the restart after failure to prevent rapid crash, restart, crash...cycling if something (like full file system) goes really wrong.
Are there any related messages in other /var/log files? What is in your /etc/rsyslog.conf and /etc/rsyslog.d/* ?
The entries you've shown only show startup related to RuuviCollector, none related to failure.
Thanks for the restart config, I found the problem for the crash from the syslog but after this, it has gotten stuck again, so I also managed to get the thread dump.
rsyslog.conf was the default, looks like this
rsyslog.d/ was empty by default
so on 10th when going through the syslog I found that around of the time of the drop there's something like this
Oct 10 17:17:22 kosong dhcpcd[523]: eth0: carrier lost
and it goes up and down for a few times and this seems to somehow cause the whole device to restart, after this restart it somehow manages to get back up and running, but I guess it won't bring the Ruuvicollector/Bluetooth device in working state.
it seems like other people have experienced similar problems caused by the router on RPI3 and RPI4, so maybe this is the case as well but it seems that this is not a problem with the collector but the raspberry itself.
so I have restarted the service on 14th around 12:00 and it has been running until 18:27 and then the dataflow has stopped to grafana. so I checked the logs and status
-- Logs begin at Thu 2019-02-14 18:11:59 +08, end at Wed 2019-10-16 15:47:18 +08. --
Oct 14 11:49:58 kosong systemd[1]: Started RuuviCollector.
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.635 DEBUG [Main] Starting scan with: [hcitool, lescan, --duplicates, --passive]
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.641 DEBUG [Main] Starting dump with: [hcidump, --raw]
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.643 INFO [Main] BLE listener started successfully, waiting for data...
Oct 14 11:49:59 kosong RuuviCollector[6884]: If you don't get any data, check that you are able to run 'hcitool lescan' and 'hcidump --raw' without issues
Oct 14 11:50:00 kosong RuuviCollector[6884]: 2019-10-14 11:50:00.716 INFO [Main] Successfully reading data from hcidump
root@kosong:~# systemctl status ruuvi-collector.service
● ruuvi-collector.service - RuuviCollector
Loaded: loaded (/etc/systemd/system/ruuvi-collector.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2019-10-14 11:49:58 +08; 2 days ago
Main PID: 6884 (java)
Tasks: 25 (limit: 4915)
Memory: 65.5M
CGroup: /system.slice/ruuvi-collector.service
├─6884 /usr/bin/java -jar /home/pi/git/ruuvi-collector/target/ruuvi-collector-0.2.jar
├─6900 hcitool lescan --duplicates --passive
└─6902 hcidump --raw
Oct 14 11:49:58 kosong systemd[1]: Started RuuviCollector.
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.635 DEBUG [Main] Starting scan with: [hcitool, lescan, --duplicates, --passive]
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.641 DEBUG [Main] Starting dump with: [hcidump, --raw]
Oct 14 11:49:59 kosong RuuviCollector[6884]: 2019-10-14 11:49:59.643 INFO [Main] BLE listener started successfully, waiting for data...
Oct 14 11:49:59 kosong RuuviCollector[6884]: If you don't get any data, check that you are able to run 'hcitool lescan' and 'hcidump --raw' without issues
Oct 14 11:50:00 kosong RuuviCollector[6884]: 2019-10-14 11:50:00.716 INFO [Main] Successfully reading data from hcidump
so basically it's saying it's running fine I killed the java process and got this
after that status looks like this
● ruuvi-collector.service - RuuviCollector
Loaded: loaded (/etc/systemd/system/ruuvi-collector.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2019-10-14 11:49:58 +08; 2 days ago
Main PID: 6884 (java)
Tasks: 25 (limit: 4915)
Memory: 65.7M
CGroup: /system.slice/ruuvi-collector.service
├─6884 /usr/bin/java -jar /home/pi/git/ruuvi-collector/target/ruuvi-collector-0.2.jar
├─6900 hcitool lescan --duplicates --passive
└─6902 hcidump --raw
Oct 16 15:49:47 kosong RuuviCollector[6884]: "G1 Conc#0" os_prio=0 cpu=0.05ms elapsed=187188.73s tid=0xb5f71c00 nid=0x1ae8 runnable
Oct 16 15:49:47 kosong RuuviCollector[6884]: "G1 Refine#0" os_prio=0 cpu=4.13ms elapsed=187188.73s tid=0xb5fd8400 nid=0x1ae9 runnable
Oct 16 15:49:47 kosong RuuviCollector[6884]: "G1 Refine#1" os_prio=0 cpu=0.08ms elapsed=187187.20s tid=0x71e9a800 nid=0x1afa runnable
Oct 16 15:49:47 kosong RuuviCollector[6884]: "G1 Young RemSet Sampling" os_prio=0 cpu=41612.63ms elapsed=187188.73s tid=0xb5fd9800 nid=0x1aea runnable
Oct 16 15:49:47 kosong RuuviCollector[6884]: "VM Periodic Task Thread" os_prio=0 cpu=107240.78ms elapsed=187188.56s tid=0x72b8d800 nid=0x1af2 waiting on condition
Oct 16 15:49:47 kosong RuuviCollector[6884]: JNI global refs: 14, weak refs: 0
Oct 16 15:49:47 kosong RuuviCollector[6884]: Heap
Oct 16 15:49:47 kosong RuuviCollector[6884]: garbage-first heap total 63488K, used 37583K [0x75c00000, 0xb3a00000)
Oct 16 15:49:47 kosong RuuviCollector[6884]: region size 1024K, 34 young (34816K), 1 survivors (1024K)
Oct 16 15:49:47 kosong RuuviCollector[6884]: Metaspace used 10502K, capacity 10626K, committed 10648K, reserved 11568K
so now I just restarted the service and it's running for now.
Thanks for the thread dump. The relevant part:
Oct 16 15:49:47 kosong RuuviCollector[6884]: "main" #1 prio=5 os_prio=0 cpu=123603.16ms elapsed=187188.73s tid=0xb5f0a400 nid=0x1ae5 runnable [0xb60f3000]
Oct 16 15:49:47 kosong RuuviCollector[6884]: java.lang.Thread.State: RUNNABLE
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.FileInputStream.readBytes(java.base@11.0.3/Native Method)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.FileInputStream.read(java.base@11.0.3/FileInputStream.java:279)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.BufferedInputStream.read1(java.base@11.0.3/BufferedInputStream.java:290)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.BufferedInputStream.read(java.base@11.0.3/BufferedInputStream.java:351)
Oct 16 15:49:47 kosong RuuviCollector[6884]: - locked <0x75debea8> (a java.lang.ProcessImpl$ProcessPipeInputStream)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at sun.nio.cs.StreamDecoder.readBytes(java.base@11.0.3/StreamDecoder.java:284)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at sun.nio.cs.StreamDecoder.implRead(java.base@11.0.3/StreamDecoder.java:326)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at sun.nio.cs.StreamDecoder.read(java.base@11.0.3/StreamDecoder.java:178)
Oct 16 15:49:47 kosong RuuviCollector[6884]: - locked <0x75deff08> (a java.io.InputStreamReader)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.InputStreamReader.read(java.base@11.0.3/InputStreamReader.java:185)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.BufferedReader.fill(java.base@11.0.3/BufferedReader.java:161)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.BufferedReader.readLine(java.base@11.0.3/BufferedReader.java:326)
Oct 16 15:49:47 kosong RuuviCollector[6884]: - locked <0x75deff08> (a java.io.InputStreamReader)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at java.io.BufferedReader.readLine(java.base@11.0.3/BufferedReader.java:392)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at fi.tkgwf.ruuvi.Main.run(Main.java:79)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at fi.tkgwf.ruuvi.Main.run(Main.java:70)
Oct 16 15:49:47 kosong RuuviCollector[6884]: at fi.tkgwf.ruuvi.Main.main(Main.java:30)
indicates that the main thread is still alive and in an expected state, more specifically it's waiting for more output from the hcidump process, which also means the the hcidump process is still running (as it should be). However, for some reason hcidump doesn't seem to be outputting any more data for some reason that lies somewhere outside RuuviCollector itself, ie. the bluetooth device itself may have ended up in some weird state, another process may have taken control over bluetooth, or something like that.
When this happens, you can try to run hcidump yourself with hcidump --raw
and see if you get any output and whether there's any change in RuuviCollector while doing it, or manually performing a scan with hcitool lescan --duplicates --passive
and seeing if anything happens (this with RuuviCollector still running), there's a small chance that RuuviCollector may start outputting data again when you do this.
Describe the bug RuuviCollector crashes after couple of hours usage ''connection refused'' Java print this: sudo systemctl status ruuvicollector.service ● ruuvicollector.service - RuuviCollector Service Loaded: loaded (/etc/systemd/system/ruuvicollector.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-06-18 15:38:35 EEST; 1h 55min ago Main PID: 466 (java) CGroup: /system.slice/ruuvicollector.service ├─466 /usr/bin/java -jar /home/pi/ruuvi-collector-0.2.jar ├─728 hcitool lescan --duplicates --passive └─744 hcidump --raw
Jun 18 15:38:59 raspberrypi java[466]: Caused by: java.net.ConnectException: Connection refused Jun 18 15:38:59 raspberrypi java[466]: at java.net.PlainSocketImpl.socketConnect(Native Method) Jun 18 15:38:59 raspberrypi java[466]: at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) Jun 18 15:38:59 raspberrypi java[466]: at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) Jun 18 15:38:59 raspberrypi java[466]: at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) Jun 18 15:38:59 raspberrypi java[466]: at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) Jun 18 15:38:59 raspberrypi java[466]: at java.net.Socket.connect(Socket.java:589) Jun 18 15:38:59 raspberrypi java[466]: at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129) Jun 18 15:38:59 raspberrypi java[466]: at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:245) Jun 18 15:38:59 raspberrypi java[466]:
Environment raspberry 3, Rasbian java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) Client VM (build 25.65-b01, mixed mode)