Closed ivanyu closed 1 month ago
Most likely, this happens because it takes longer than 500 ms to fetch the remote data (which is a separate problem, sounds too slow). The server follows the fetch.max.wait.ms setting and responds to the client with an empty result. The client sends another fetch request to continue reading from IncrementalFetchContext on the broker side. This happens again and again until eventually a request is responded with some data.
So I thought about this on the weekend and how to solve this. It seems like we have to keep state when a blank response is made so that when the next request comes though rather then re-initiating new request we wait on the older request that is still running.
There are also some counter examples on doing this, i.e. we might be masking (as you hinted) a more serious problem. If its taking more than 500 millis to get responses from tiered storage then that should looked into that
In the normal consumption from the remote tier, there's this chain of call (in the 3.3 current implementation on this fork):
KafkaApis.handleFetchRequest
ReplicaManager.fetchMessages
RemoteLogManager.asyncRead
RemoteLogReader.execute
RemoteLogManager.read
RemoteStorageManager.fetchLogSegment
If we create a consumer with default settings, particularly with the default
fetch.max.wait.ms=500
, and try do one poll,fetchLogSegment
will be called multiple times, thus initiating fetching from the remote tier multiple times. This is not great. Iffetch.max.wait.ms
is increased to e.g.5000
(5 seconds),fetchLogSegment
is executed only once.Most likely, this happens because it takes longer than 500 ms to fetch the remote data (which is a separate problem, sounds too slow). The server follows the
fetch.max.wait.ms
setting and responds to the client with an empty result. The client sends another fetch request to continue reading fromIncrementalFetchContext
on the broker side. This happens again and again until eventually a request is responded with some data.Following
fetch.max.wait.ms
by the broker is correct (i.e. responding with empty response and awaiting continuation from the client) is correct. However, subsequent incremental fetches should not callfetchLogSegment
, i.e. should not initiate fetching from the remote tier.