aiven / kafka

Mirror of Apache Kafka
Apache License 2.0
2 stars 1 forks source link

fetchLogSegment is called more frequently than it should with default consumer settings #28

Closed ivanyu closed 2 months ago

ivanyu commented 1 year ago

In the normal consumption from the remote tier, there's this chain of call (in the 3.3 current implementation on this fork):

  1. KafkaApis.handleFetchRequest
  2. ReplicaManager.fetchMessages
  3. RemoteLogManager.asyncRead
  4. [asynchronously] RemoteLogReader.execute
  5. RemoteLogManager.read
  6. RemoteStorageManager.fetchLogSegment

If we create a consumer with default settings, particularly with the default fetch.max.wait.ms=500, and try do one poll, fetchLogSegment will be called multiple times, thus initiating fetching from the remote tier multiple times. This is not great. If fetch.max.wait.ms is increased to e.g. 5000 (5 seconds), fetchLogSegment is executed only once.

Most likely, this happens because it takes longer than 500 ms to fetch the remote data (which is a separate problem, sounds too slow). The server follows the fetch.max.wait.ms setting and responds to the client with an empty result. The client sends another fetch request to continue reading from IncrementalFetchContext on the broker side. This happens again and again until eventually a request is responded with some data.

Following fetch.max.wait.ms by the broker is correct (i.e. responding with empty response and awaiting continuation from the client) is correct. However, subsequent incremental fetches should not call fetchLogSegment, i.e. should not initiate fetching from the remote tier.

mdedetrich commented 1 year ago

Most likely, this happens because it takes longer than 500 ms to fetch the remote data (which is a separate problem, sounds too slow). The server follows the fetch.max.wait.ms setting and responds to the client with an empty result. The client sends another fetch request to continue reading from IncrementalFetchContext on the broker side. This happens again and again until eventually a request is responded with some data.

So I thought about this on the weekend and how to solve this. It seems like we have to keep state when a blank response is made so that when the next request comes though rather then re-initiating new request we wait on the older request that is still running.

There are also some counter examples on doing this, i.e. we might be masking (as you hinted) a more serious problem. If its taking more than 500 millis to get responses from tiered storage then that should looked into that