apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

Segment Reload Unable to Fix Segment in Error State #10342

Open ankitsultana opened 1 year ago

ankitsultana commented 1 year ago

I have seen this behavior quite often in our systems where a segment would go into error state in one of the servers and a reload doesn't fix the issue. However if we restart the server then the segment becomes healthy again. On checking the logs, I often see something like this:

INFO  org.apache.helix.messaging.handling.HelixTaskExecutor  - Scheduling message db272de0-949c-4d9f-87dd-f4f912f19a38: my_table_REALTIME:my_table__79__0__20230224T2228Z, null->null
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  my_table_REALTIME-SegmentReloadMessageHandler  - Handling message: ZnRecord=db272de0-949c-4d9f-87dd-f4f912f19a38, {CREATE_TIMESTAMP=1677395071480, EXECUTE_START_TIMESTAMP=1677395072598, MSG_ID=db272de0-949c-4d9f-87dd-f4f912f19a38, MSG_STATE=new, MSG_SUBTYPE=RELOAD_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=my_table__79__0__20230224T2228Z, RESOURCE_NAME=my_table_REALTIME, RETRY_COUNT=0, SRC_CLUSTER=..., SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_.., TGT_NAME=61f.., TGT_SESSION_ID=.., TIMEOUT=-1, forceDownload=false}{}{}, Stat=Stat {_version=0, _creationTime=1677395072613, _modifiedTime=1677395072613, _ephemeralOwner=0}
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  my_table_REALTIME-SegmentReloadMessageHandler  - Waiting for lock to refresh : my_table__79__0__20230224T2228Z, queue-length: 0
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  my_table_REALTIME-SegmentReloadMessageHandler  - Acquired lock to refresh segment: my_table__79__0__20230224T2228Z (lock-time=0ms, queue-length=0)
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  o.apache.pinot.server.starter.helix.HelixInstanceDataManager  - Reloading single segment: my_table__79__0__20230224T2228Z in table: my_table_REALTIME
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  o.apache.pinot.server.starter.helix.HelixInstanceDataManager  - Segment metadata is null. Skip reloading segment: my_table__79__0__20230224T2228Z in table: my_table_REALTIME
2023-02-26 07:04:32.598 [HelixTaskExecutor-message_handle_thread] INFO  org.apache.helix.messaging.handling.HelixTask  - Message db272de0-949c-4d9f-87dd-f4f912f19a38 completed.
2023-02-26 07:04:32.600 [HelixTaskExecutor-message_handle_thread] INFO  org.apache.helix.messaging.handling.HelixTask  - Delete message db272de0-949c-4d9f-87dd-f4f912f19a38 from zk!
2023-02-26 07:04:32.600 [HelixTaskExecutor-message_handle_thread] INFO  org.apache.helix.messaging.handling.HelixTaskExecutor  - message finished: db272de0-949c-4d9f-87dd-f4f912f19a38, took 2

This is the corresponding code:

https://github.com/apache/pinot/blob/3772b55dc4c35673762a182b2ee650469560aa97/pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java#L277

I was wondering that if we can't find the segment metadata locally can we fetch it from ZK? Also is there a way where the server can auto-recover from such a situation?

One of the cases where I have seen this issue happen is when there's a server restart and an inflight onBecomeConsumingFromOffline is killed. When the server comes back up, I only see that it logs that this segment is in error in ServiceStatus.

suddendust commented 1 year ago

@ankitsultana Have you tried resetting such segments?

https://docs.pinot.apache.org/basics/getting-started/frequent-questions/operations-faq#:~:text=RESET%3A%20this%20gets,in%20error%20states.

ankitsultana commented 1 year ago

I haven't tested that yet but I think reset will take the segment offline which may cause the segment to be skipped from queries altogether or it may cause the queries to fail (segment unavailable exception).

Usually this issue happens for us in one of the replicas of the segment so it doesn't impact in-flight queries.

saurabhd336 commented 1 year ago

Reset allows reseting a segment exclusively on a particular server too by setting the targetInstance parameter in the /segments/{tableNameWithType}/{segmentName}/reset API

ankitsultana commented 1 year ago

I see. Thanks we can try it out the next time (I also need to read a bit about this).

Regardless though, I think we should try to fix the state transition as well.

Jackie-Jiang commented 1 year ago

Here are the difference for these similar terms: https://docs.pinot.apache.org/basics/getting-started/frequent-questions/operations-faq#whats-the-difference-to-reset-refresh-or-reload-a-segment

For reload, it is not performed using the state transition. We can consider adding a controller periodic task to automatically resetting the error segments. IIRC, we don't always do error -> offline reset because it might run into infinite loading for bad segment

ankitsultana commented 1 year ago

@Jackie-Jiang : Any concerns in making the zk call? We actually make a zk call anyways to get the metadata in reloadSegmentWithMetadata if the segment is not a mutable one.

Jackie-Jiang commented 1 year ago

@Jackie-Jiang : Any concerns in making the zk call? We actually make a zk call anyways to get the metadata in reloadSegmentWithMetadata if the segment is not a mutable one.

@ankitsultana Since reload doesn't follow the regular state transition (it is a custom message), re-download segment won't bring the segment back to ONLINE state. It will cause inconsistency between server current state and segment status.