elastic / logstash-forwarder

An experiment to cut logs in preparation for processing elsewhere. Replaced by Filebeat: https://github.com/elastic/beats/tree/master/filebeat
Other
1.79k stars 416 forks source link

Lumberjack Protocol - Sequence Number Wraparound #199

Open ghost opened 10 years ago

ghost commented 10 years ago

I was studying the Lumberjack Protocol down in logstash-forwarder/lib/lumberjack/client.rb, specifically the use of the @sequence number. In the inc function, when sequence reaches Lumberjack::SEQUENCE_MAX, it wraps back around to 0. But in the write_hash and ack functions, the wraparound is not accounted for. For example, if window_size is 4, SEQUENCE_MAX, sequence and last_ack are all at a billion. The sequence increments and wraps around back to 1 and increments up to 4 (window_size). The calculation is "ack if (4 - (billion + 1)) >= 4". Since that is false, we don't wait for the next ack. And I think we stop waiting for acks altogether.

ghost commented 10 years ago

As a secondary issue in client.rb. The value triggering the wraparound is SEQUENCE_MAX and is calculated as 2**62-1 (ints are 64 bits on my system). However, the sequence number is being packed in a 32 bit field within the lumberjack protocol. So SEQUENCE_MAX also needs to be less than 32 bits.

This still seems to allow about 2.8 petabytes to get transferred on a new connection before the acks fail to get properly exchanged. But, since the acks are not yet processed by the code, log traffic still gets transferred even after the ack-wraparound fails.

jordansissel commented 10 years ago

Agree, maybe to work around this there should be a new frame type to indicate a restart of sequence counter.

driskell commented 10 years ago

Since sequence number is only relevant within a frame, and only for partial ack, it should reset after each frame. That's how I solved this in Log Courier.

On a side note it needs a spooler. The client.rb sends each event one by one. After I did that in courier it sent events a fair bit quicker as full packets were utilised.

jordansissel commented 9 years ago

@driskell yeah, I agree re: sequence reset after each ack.