NubeIO / rubix-point-server

5 stars 0 forks source link

Try to replicate InfluxDB sync issue #444

Open RaiBnod opened 1 year ago

RaiBnod commented 1 year ago

https://dash.nube-iiot.com/precise-air/d/PmXrWQDaa/sap-heartbeats?orgId=1&from=1675953722225&to=1676160086990

https://user-images.githubusercontent.com/6800775/223656813-fbda68d9-bd93-48c6-abcd-c16b145d79cd.mov

The InfluxDB sync in a certain period of time is blank. But the device and point-server are up from last more than 3 months with generating the history of 15 minutes of intervals.

This could possibly happen when:

But there is also no sign of the device went offline for that time period. Also, it's up from last more than 3 months.

So, try to replicate this case when InfluxDB goes down while writing the values. Or try to find other possible cases.

RaiBnod commented 1 year ago

The device has point-server version v2.0.7, where 200 rows of histories get stored for each point before cleaning it. So, 200 rows mean 15*200/60 = 50 hours. So, it gives the hint of the device's internet was also good.

A couple of crucial points on this are:

  1. The InfluxDB server gets restarted at random periods of time
  2. Different points with different frequencies of history writes are having gaps at the same time.

1. InfluxDB server logs:

root@ubuntu-s-1vcpu-1gb-sgp1-01:~# cat syslog.7 |grep restart
Mar  3 07:00:16 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 07:00:16 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6175.
Mar  3 07:12:24 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 07:12:24 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6176.
Mar  3 08:00:20 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 08:00:20 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6177.
Mar  3 08:12:25 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 08:12:25 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6178.
Mar  3 09:00:18 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 09:00:18 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6179.
Mar  3 09:12:24 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 09:12:24 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6180.
Mar  3 10:00:21 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 10:00:21 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6181.
Mar  3 10:12:34 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Service hold-off time over, scheduling restart.
Mar  3 10:12:34 ubuntu-s-1vcpu-1gb-sgp1-01 systemd[1]: influxdb.service: Scheduled restart job, restart counter is at 6182.

2. Different frequency stored points are getting gaps at the same time:

https://user-images.githubusercontent.com/6800775/224295067-7e813136-c0c6-4948-a2f9-fb72a6a0a4a0.mov

RaiBnod commented 1 year ago

One more example to test:

https://github.com/NubeIO/rubix-point-server/assets/6800775/44fd9800-36ee-4431-8f48-e0f85f97a97e

data.txt [change it into data.db]

RaiBnod commented 1 year ago

From ChatGPT:

In influxdb 1.x, what happens if it gets restarted just after we insert the record successfully? It's not persisting the value.

In InfluxDB 1.x, if you insert a record successfully and the server restarts immediately afterward, the inserted data may not be persisted to disk, and therefore it may appear to be lost. This is because InfluxDB follows a write-ahead log (WAL) mechanism for data ingestion.

Here's how the process generally works:

To minimize data loss in such scenarios, you can configure the following settings in the influxdb.conf configuration file:

Please note that even with these settings, there is always some level of risk of data loss during a restart or crash scenario. Therefore, it is crucial to have appropriate backup and high availability strategies in place to safeguard against data loss in critical environments. In InfluxDB 2.x, there are improvements and changes in the data storage mechanism, and it is recommended to consider using the latest version if possible.