Hackuarium / bioreactor-docker

Dockerization of Hackuarium/nodered-bioreactor-gui
https://github.com/Hackuarium/nodered-bioreactor-gui
1 stars 1 forks source link

influxdb: data corruption #2

Closed opatiny closed 4 years ago

opatiny commented 4 years ago

It happens that the bioreactors time is not synced with the actual time. This causes troubles with the retrieval of the data from the bioreactor and the logging into InfluxDB. Indeed, we use the log id of the last entry in the database to know from where to get new data

Scenario 1: bioreactor time in the past

Generally, the bioreactor's time would be in the past, for example because it hasn't been powered for a while. In that case, when the time is updated, there is an inconsistency in the timestamps, but the logging continues to work. As a result, the data displayed in the charts of the GUI would have a gap, but it would still work.

Scenario 2: bioreactor time in the future

If however the bioreactor's time happens to be in the future and it is updated, it is more problematic. Indeed, since the entries in the InfluxDB database are sorted by timestamp, the last entry is not the one with the biggest log ID. This causes the GUI to loop, because it always asks for the same logs, which are already in the DB.

Solution

Fetch the maximal log ID in the database instead of the log ID of the last entry. Yet, I am worried that it would become slow when there is a lot of data, especially that we do this every 10 seconds. To fix this, I imagined that we could maybe fetch the max value from only the n last logs. I just do not know how big this number should be.

opatiny commented 4 years ago

I found another solution to solve the hypothetical problem of the max() function being slow for large datasets. It would be to use the last() function as a default and count the number of time the last log id is the same in a row. Like that, if the queries loop over the same logs, we can detect it and fetch the max() log id once.

I have not applied this solution yet and only switched to max() for now. The solution should be implemented if we notice a problem someday.

For now, this is the change made in the query.

Old:

msg.query = `select last("id") from "bio_${msg.deviceId}"`;

New:

msg.query = `select max("id") from "bio_${msg.deviceId}"`;