Open damianharouff opened 2 weeks ago
It also seems to be only "large" negative numbers, e.g. it's not a few ms negative, it's minimum ~ -10000 ms:
I just checked my environment and i also have negative values.
Probably unrelated, i see that gl2_processing_duration_ms
is showing 2 different types:
@damianharouff @drewmiranda-gl Can you post the full message (ideally from OpenSearch) for an affected message so we can see all the timestamp and timing-related fields?
@bernd absolutely, here's a few, let me know if you need more:
https://paste.cekkent.net/92504773-3595-4a7a-ace4-e863f6d6a340/JKmdJEU1 https://paste.cekkent.net/945a30ab-c805-426c-92df-35c434f0d9c4/4bA2VmGn https://paste.cekkent.net/6ba7587e-d311-4459-99e0-b2bbc1a812d9/vFVYzflR https://paste.cekkent.net/87f52147-e46e-48a6-ac60-1606953674f9/RWYvXGyu https://paste.cekkent.net/5ed6011a-7741-4ad0-8f27-2c8edac00c70/N8DHPfdY
@damianharouff Thank you! :pray: That helped!
Looking at the first message. :eyes: The problem is that the receive timestamp is after the processing timestamp. So we need to figure out how that happens.
"gl2_receive_timestamp": "2024-06-26 13:25:30.462",
"gl2_processing_timestamp": "2024-06-26 13:25:18.192",
"gl2_processing_duration_ms": -12270,
For what its worth, in my environment this only happened on a specific day and never before and never since.
Sanitized messages:
{
"gl2_accounted_message_size": 405,
"gl2_receive_timestamp": "2024-06-17 01:36:00.129",
"gl2_remote_ip": "192.168.0.1",
"gl2_remote_port": 55834,
"streams": [
"63068dc98a735a37e8d535f9"
],
"gl2_message_id": "01J0HXCE41012S9S3M40133D5X",
"gl2_source_input": "6328e4593521fe2ae2ee110c",
"gl2_processing_timestamp": "2024-06-17 01:36:00.128",
"gl2_source_node": "7a72f5ea-ccb3-4b6f-858b-bab2d1aaf7b5",
"gl2_forwarder_input": "62bb56a525abac169687b4ab",
"gl2_processing_duration_ms": -1,
"timestamp": "2024-06-17T01:36:00.129Z"
}
{
"gl2_accounted_message_size": 819,
"gl2_receive_timestamp": "2024-06-15 17:50:53.277",
"gl2_remote_ip": "192.168.0.108",
"gl2_remote_port": 45175,
"streams": [
"000000000000000000000001"
],
"gl2_message_id": "01J0EGC28V000GFMPF9M1X5KG0",
"gl2_source_input": "63112596964e770359618042",
"gl2_processing_timestamp": "2024-06-15 17:50:53.246",
"gl2_source_node": "7a72f5ea-ccb3-4b6f-858b-bab2d1aaf7b5",
"gl2_forwarder_input": "62bb56a525abac169687b4ab",
"gl2_processing_duration_ms": -31,
"timestamp": "2024-06-15T17:50:53.211Z"
}
If it matters, all of my ingest comes from forwarders.
I was wondering about that, and thought it was maybe due to incorrect time on one of the nodes, but GL and OSearch nodes are all showing a plausible time:
damian@Damians-MBP ansible % ansible -i hosts graylog -m shell -a "date"
192.168.69.201 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.211 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.210 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.200 | CHANGED | rc=0 >>
Wed Jun 26 02:14:15 PM UTC 2024
192.168.69.202 | CHANGED | rc=0 >>
Wed Jun 26 02:14:16 PM UTC 2024
192.168.69.213 | CHANGED | rc=0 >>
Wed Jun 26 02:14:18 PM UTC 2024
192.168.69.212 | CHANGED | rc=0 >>
Wed Jun 26 02:14:18 PM UTC 2024
damian@Damians-MBP ansible %
Slight seconds skew is due to ansible running serially.
If it matters, all of my ingest comes from forwarders.
Yeah, I think it does. Damian's messages are also ingested via the Forwarder.
Thanks, folks! I think we provided enough context and info so the team can start debugging.
Posting some time skew info since is suspect this is contributing if not the cause
Negative processing time as seen in graylog:
Time skew for graylog server:
Time skew for applicable forwarder:
The statement about Forwarders prompted me to look at my forwarder box, and....
...turns out my forwarder box was 12 seconds in the future. I installed and activated ntp on it, and no more messages with a negative processing duration timestamp. It does make me wonder why only some of my messages were negative and not all, since all of my ingest is through this one forwarder.
While reviewing
gl2_processing_duration_ms
, it was noted that some messages have a negative processing time associated with them:Doesn't seem to have any specific case or reason, also happens across multiple sources, some processed by Illuminate, some not:
Happy to give Graylogger access to my cluster for further troubleshooting if not able to replicate in a dev environment.
Your Environment