Figure out storing very big integers in PostgreSQL

bochecha commented 4 years ago

Some of the Endless metrics events have values as unsigned 64 bits integers.

The biggest integer PostgreSQL can store, bigint is a signed 64 bits integer, which is too small to store those.

We have a few possibilities, which need to be investigated, off the top of my head:

pretend they are signed 64 bits integers: we could just store them anyway as is; the value in DB would be wrong, but it could be cast when stored/retrieved; that would make queries harder?
store them as binary blobs (bytea): we could just store the binary representation of the numbers (e.g 10 would be stored as 1010); that would make queries harder?
store them as strings: we could store the string representation of the numbers (e.g 10 would be stored as "10"); queries wouldn't be too hard, it would just be frustrating to always remember to use quotes;
store them as numeric(20, 0): this type allows number of arbitratry size (up to 1000 digits!) with exact precision (as opposed to floats); however calculations with such numbers are very slow compared to integers;

There might be other possibilities…

wjt commented 4 years ago

The two events in question are:

"Cache has invalid elements" `cbfbcbdb-6af2-f1db-9e11-6cc25846e296`

The payload is (tt):

number of valid elements found in the cache
total number of bytes read from the cache

These will never be anywhere near as large as 2 63, since no hard disk exists anywhere near that size. They have type u64 because the disk IO API is in terms of size_t, as is the number of elements in a GPtrArray. (The curious reader could start at https://github.com/endlessm/eos-event-recorder-daemon/blob/e38312abf3f51247a476d20bd9de2fa3d4bbfbb3/daemon/emer-daemon.c#L783-L792 and follow the API call.) These can just be clamped to be within the range [0, 2 63 - 1] which fits in an i64, and stored as i64.

(In practice neither one will be anywhere near 2 31, since the on-disk cache is capped at 10 MB by default. However you could imagine a very-actively-used, mostly-offline system changing that setting to be greater than 2 31, I suppose.)

"Startup Finished" `bf7e8aed-2932-455c-a28e-d407cfd5aaba`

The fields of this one come from systemd's StartupFinished() signal.

https://github.com/endlessm/eos-metrics-instrumentation/blob/bf055cc7db61cd2040404ee19bbf1197d6a547e0/src/eos-metrics-instrumentation.c#L361-L364 https://www.freedesktop.org/wiki/Software/systemd/dbus/

StartupFinished() is sent out when startup finished. It carries six usec timespan values each indicating how much boot time has been spent in the firmware (if known), in the boot loader (if known), in the kernel initialization phase, in the initrd (if known), in userspace and in total. These values may also be calculated from the FirmwareTimestampMonotonic, LoaderTimestampMonotonic, InitRDTimestampMonotonic, UserspaceTimestampMonotonic, FinishTimestampMonotonic properties (see below).

So, these are all timespans in microseconds. By my calculation, 2 63 microseconds is 73067.6604051 years. Again, these will never be anywhere near that large, so could be clamped to the range [0, 2 63 - 1] and stored as i64. (2 31 microseconds is 35.7913333333 minutes, and you'd really, really** hope that a computer can boot in less than 35 minutes, so one could even get away with storing as i32.)

liZe commented 4 years ago

Again, these will never be anywhere near that large, so could be clamped to the range [0, 2 ** 63 - 1] and stored as i64.

(You probably mean [0, 2 ** 31 - 1].)

I think that it’s the best solution for both events. We can clamp the values and store them in PostgreSQL’s bigint columns, keeping fast operations. We’ll lose some information if the values we get are greater than 2**31, but it shouldn’t happen anyways.

wjt commented 4 years ago

I did mean 2 ** 63 - 1, the largest positive 64-bit signed integer, which is the largest value that fits into a bigint (the widest integer type postgres supports according to https://www.postgresql.org/docs/9.1/datatype-numeric.html).

It's probably true that in both cases even 2 ** 31 - 1 (the largest positive 4-byte integer) is probably good enough. If someone has more than 2 GB of buffered metrics events and they're all corrupt, we're in a really bad place.

liZe commented 4 years ago

I did mean 2 ** 63 - 1, the largest positive 64-bit signed integer, which is the largest value that fits into a bigint (the widest integer type postgres supports according to https://www.postgresql.org/docs/9.1/datatype-numeric.html).

Of course, it’s not 2 31, it’s 2 63, you’re right. The problem is then even more improbable than I thought.

Using integers instead of bigints is not really useful (as the tables shouldn’t be that big), I think that storing clamped bigints is the best solution.

Should I add the code to store these events?

wjt commented 4 years ago

Sounds good!

endlessm / azafea