apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.31k stars 1.24k forks source link

Concurrent Offline Table Segment Uploads Can Lead to Error State #11636

Open ankitsultana opened 10 months ago

ankitsultana commented 10 months ago

For two of our use-cases we started seeing weird segment in error state issues recently and on debugging we found that it is because of the fact that uploading offline table segments concurrently across different controllers is not safe.

I won't go into the full root-cause but will add some notes:

This exception was seen in the server:

Caught exception in state transition from OFFLINE -> ONLINE for resource: <table-name>, partition: <segment-name>"}
java.lang.NullPointerException: null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:882)
        at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOrReplaceSegment(HelixInstanceDataManager.java:401)
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineS
tateModelFactory.java:163)

And this was seen in the controller:

java.lang.RuntimeException: Caught exception while updating ideal state for resource: <table-name>
        at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:169)
        at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:193)
        at org.apache.pinot.controller.helix.core.PinotHelixResourceManager.assignTableSegment(PinotHelixResourceManager.java:2137)
        at org.apache.pinot.controller.api.upload.ZKOperator.processNewSegment(ZKOperator.java:294)
        at org.apache.pinot.controller.api.upload.ZKOperator.completeSegmentOperations(ZKOperator.java:82)
        at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegment(PinotSegmentUploadDownloadRestletResource.java:360)
        at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegmentAsJson(PinotSegmentUploadDownloadRestletResource.java:481)
        at jdk.internal.reflect.GeneratedMethodAccessor343.invoke(Unknown Source)
        ...
Caused by: org.apache.pinot.spi.utils.retry.AttemptsExceededException: Operation failed after 20 attempts
        at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:65)
        at org.apache.pinot.common.utils.helix.HelixHelper.updateIdealState(HelixHelper.java:98)

The easiest solution to this problem is to use a single controller for concurrent uploads or do sequential uploads in the offline ingestion pipeline which is what we will be doing. Creating this ticket if someone is interested in doing a native fix for this.

Jackie-Jiang commented 10 months ago

The root cause is that controller not able to update the ideal state in 20 attempts, so it left it in inconsistent state with the ZK metadata. We need to know why it kept failing