What’s the deal with durability and error handling?

aviflax commented 7 years ago

Hi, I’m totally new to InfluxDB but considering using it for a project. I scanned your readme but didn’t see anything explicit about durability or error handling.

So:

If my code calls write_point can I expect that if an error occurs, an exception will be raised?
1. (I hope so.)
If my code calls write_point and it succeeds (however that is determined) can be be reasonably guaranteed that my data will be durably retained in the database from that point forward?
1. I.E. is it akin to committing a transaction with, say a properly configured PostgreSQL server, in which case if I send COMMIT and the server replies OK (or however the protocol works) — then the data is definitely stored in a durable fashion?

dmke commented 7 years ago

Hi,

I can only speak for this client library, but both InfluxDB server, as well as this library, are relative robust in regard to errors. This client will either raise exceptions at opportune moments, or try to write again (if possible and configured). Unless the server returned an error, the data eventually will get persisted on disk. Some edge cases involving out-of-memory scenarios exist, though that's also the case for Postgres.

InfluxDB uses HTTP as transport layer, so data packet will arrive at the server and in the order sent, and we get an acknowledgement of receipt in the form of an HTTP 200 OK response (you can use an UDP socket if you can live without these guarantees).

Two caveats exist:

In contrast to SQL, the InfluxDB schema does not support auto-incrementing (primary) keys. The concept most similar to SQL is that of a composite primary keys: an individual data point is uniquely identified by its timestamp and the tag set (+ the measurement (table) you're writing to). While in SQL, you have to declare the composition beforehand in the schema, InfluxDB is schema-less and the tag keys could be completely disjunct from one point to another.

The implication is, if you write two points with the exact same tags and timestamp, the latter will overwrite the first one. This is easily achievable if you write multiple points per second from the same source, but you have reduced the timestamp precision to, say, seconds. So far, there were multiple occasions where people needed to write data with milli- or nanosecond precision, but have overseen the default configuration for the Ruby client being second...
One other caveat are retention policies, which will delete data points automatically (the default RP has an "infinite" limit, though). With time series data, you don't want to keep all the data all the time — depending on your use case.

Retention policies usually are used if you have also setup "continuous queries", to sample the data down, like in RRD: you take the average (or min/max/95-percentile/...) of all the point within a given a time frame and reduce them to a single point. Then you write this point into another measurement (possibly with another retention policy as well). The retention policy will take care of discarding the data.

I highly recommend reading the server docs at http://docs.influxdata.com/influxdb, in particular the concepts section.

aviflax commented 7 years ago

@dmke wow. Thank you so so much for that clear, comprehensive, super helpful, and super rapid reply! I am deeply impressed and inspired by your generosity. Thank you!

dmke commented 7 years ago

One is glad to be of service. :-)

Let me know if you have any further questions.

InfluxCommunity / influxdb-ruby

What’s the deal with durability and error handling? #192