Open eaugene opened 6 months ago
The exact issue is described below (all of these are confirmed via logs):
Current Theory: When ZK disconnection happens, the PartitionConsumer thread is still alive and holding the semaphore, and so when Helix sends OFFLINE to CONSUMING transition again for that segment, the Segment Data Manager fails to init.
I am low on time right now so can't dig deeper. Wondering if anyone can hint at some potential solutions.
I had also seen this somewhat related issue from a few years ago: https://github.com/apache/pinot/issues/7874
cc: @Jackie-Jiang
Took a deep-dive today and found the root-cause. Here's the sequence of events which triggers this:
addSegment
call and mark it succeeded anyways. RealtimeSegmentDataManager.java#L387So the RealtimeSegmentDataManager
is never destroyed and the _partitionGroupConsumerSemaphore
is never released.
As more and more segments for this partition are committed on other servers, we receive many OFFLINE to CONSUMING transitions, which pile up.
When there's another big GC, and say Helix reconnects, then the HelixTaskExecutor
will be shutdown, leading to interruption of all the semaphore acquire calls and the original error message above. (note onBecomeConsumingFromOffline
in the stack trace).
@Jackie-Jiang : this is an additional case where RealtimeTableDataManager#addSegment
can be called. Do you have any recommendations on how to handle this?
@ankitsultana Can you check why the server received OFFLINE
to ONLINE
state transition instead of CONSUMING
to ONLINE
state transition? Our current state model doesn't consider this scenario, so we should try to find out why Helix sent this state transition when the segment is actually in CONSUMING
state. Did it receive a CONSUMING
to OFFLINE
state transition before OFFLINE
to ONLINE
?
Ah my bad, I wasn't clear in the description above. Soon after the segment receives a DISCARD from the controller, there's a Helix disconnection. And a few seconds later the server reconnects and is back online.
My understanding is that after Helix reconnects, the segment will show up as OFFLINE and hence trigger the OFFLINE to ONLINE transition.
Sampled some logs which show this:
{"@timestamp":"2024-02-12T11:17:02.203+00:00","message":"Controller response {\"buildTimeSec\":-1,\"isSplitCommitType\":false,\"streamPartitionMsgOffset\":null,\"offset\":-1,\"status\":\"DISCARD\"} for http://redacted:1234/segmentConsumed?
reason=timeLimit&streamPartitionMsgOffset=368526984&instance=redacted&offset=-1&name=redacted__44__3696__20240212T1016Z&rowCount=48937&memoryUsedBytes=80637706","logger_name":"org.apache.pinot.server.realtime
.ServerSegmentCompletionProtocolHandler","thread_name":"redacted__44__3696__20240212T1016Z","level":"INFO"}
....
{"@timestamp":"2024-02-12T11:17:03.801+00:00","message":"Unable to reconnect to ZooKeeper service, session 0x500e8e168947a76 has expired","logger_name":"org.apache.zookeeper.ClientCnxn","thread_name":"main-SendThread(redacted:2181)","level":"WARN"}
Can you check the log and see if server started from OFFLINE state after re-connecting to Helix? I think it is very possible because when re-connecting to Helix, server will generate a new hash, and start from a new current state. If that is the case, that explains why we observed server not able to consume after re-connecting to Helix. We can probably add a test to reproduce this
I am not well versed with Helix, but I am kinda sure that the server started from OFFLINE state.
There's this log coming from HelixStateMachineEngine
: Resetting HelixStateMachineEngine
.
This is logged in the reset
method, which does this:
loopStateModelFactories(stateModel -> {
stateModel.reset();
String initialState = _stateModelParser.getInitialState(stateModel.getClass());
stateModel.updateState(initialState);
});
The initial state is taken from the annotation of the class. Initial state for all Pinot state models is OFFLINE.
Moreover, after this is logged, there's a ton of messages with format: Operating on table_name:segment_name
(one for each segment).
@Jackie-Jiang : lmk if this is sufficient.
This is great finding! Let me create an issue to track this
We are encountering the following issue when OFFLINE-> ONLINE transition of the segment
Pinot Version : 0.12
cc : @ankitsultana