Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.06k forks source link

GELF spec 1.2 #668

Open bernd opened 10 years ago

bernd commented 10 years ago

The GELF spec states the following about the timestamp field.

UNIX microsecond timestamp; SHOULD be set by client library. Will be set to NOW by server if absent.

The GELF parser in graylog2-server parses the timestamp field of incoming messages like this:

double messageTimestamp = doubleValue(json, "timestamp");
DateTime timestamp;
if (messageTimestamp <= 0) {
    timestamp = Tools.iso8601(); // current time
} else {
    // we treat this as a unix timestamp
    timestamp = Tools.dateTimeFromDouble(messageTimestamp);
}

...

/*
 * The double representation of a UNIX timestamp with milliseconds is a strange, human readable format.
 *
 * This sucks and no format should use the double representation. Change GELF to use long. (zomg)
 */
public static DateTime dateTimeFromDouble(double x) {
    return new DateTime(Math.round(x*1000), DateTimeZone.UTC);
}

All of the client libraries we checked use something like this for the timestamp:

double timestamp = logEvent.getTimeStamp() / 1000.0;

So in reality the timestamp field is a UNIX timestamp in seconds with optional milliseconds after the decimal point.

That means that either the GELF 1.1 specification is wrong or everyone (including ourselves) implemented it incorrectly.

Two suggestions:

  1. Adjust the timestamp field specification to match reality: "Seconds since UNIX epoch; SHOULD be set by client library. Will be set to the current system time by server if absent.".
  2. Add an optional timestamp_ms field which holds a millisecond timestamp. If timestamp_ms exists in an incoming message, that one should be used as timestamp.

This ensures backwards compatibility with existing clients but also allows fine resolution timestamps to be sent.

Any comments? Any other changes for a version 1.2 of the GELF spec?

joschi commented 10 years ago

Regarding the timestamp and the proposed timestamp_ms or in general any numerical fields we have to keep in mind, that a JavaScript Number doesn't have the same precision as a Java long or double.

Related docs:

An alternative to the proposed schema with timestamp and the optional timestamp_ms would be to store the pre-decimal point positions and the decimal places separately but as integers.

This would enable a pretty high precision, although it's questionable if this is really required. IMHO a higher precision than milliseconds currently doesn't make much sense since most systems do not offer a more granular system timer out of the box (citation needed).

joschi commented 10 years ago

As most or even all GELF libraries are using seconds when setting the timestamp, it would maybe suffice to correct the description of timestamp in the GELF 1.1 spec in place instead of releasing another version of the spec right now.

joschi commented 10 years ago

As a non-functional requirement it would be great if each section of the GELF specification could be linked to, unlike the current specs at http://graylog2.org/gelf#specs.

joschi commented 10 years ago

Additional compression algorithms, specifically Snappy would be nice. Currently only gzip and deflate are being supported.

gb6881 commented 8 years ago

Compression of GELF Messages does not work atm for TCP Connections.

Maybe a stream Compression would be more efficient here, i opened a pull request for this (closed it in the meantime, as i wanted to first check it against GELF-MessagePerMessage Compression, which didn't work :-( )

https://github.com/Graylog2/graylog2-server/pull/1364

gb6881 commented 7 years ago

and: what about supporting Arrays and Objects?

i have a metric i'd like to send in

we have a function call, where different listeners are called. and i want to know which listener is taking how long.

so normaly i would do it this way:

"listener_duration": [{ listener: "a", "duration": 17}, { listener: "b", "duration": 20}]

i need to do it at them moment kind of

"listener_duration_a": 17 "listener_duration_b": 17 generating a new field for each listener (where i don't know how many there will be), which i'll then would have to add manually to graphs etc ...

ypid-geberit commented 5 years ago

Have you thought about using the timestamp format as specified by RFC 5424 (section 6.2.3)? It has the advantage that it is human readable, can/should include the timezone and there is no issue with higher precision because the field would be a string.

This would require a new spec release, even a major release because this would be a breaking change but I would say it is worth it.

Related standards: ISO 8601, RFC 3339. Related (workaround): https://github.com/severb/graypy/issues/99