apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.53k stars 1.3k forks source link

Segment stuck in bad state until server restart when download from deepstore failed while adding new segment #14571

Open chrajeshbabu opened 3 days ago

chrajeshbabu commented 3 days ago

Segment stuck in bad state when the download from deep store failed with EOF exception while adding new segment.

{
  "segmentName": <segment_name>,
  "serverState": {
    "Server_<host>_<port>": {
      "idealState": "ONLINE",
      "externalView": "ERROR",
      "segmentSize": "0 bytes",
      "consumerInfo": null,
      "errorInfo": {
        "timestamp": "2024-11-28 19:24:50 GMT",
        "errorMessage": "Caught exception while adding ONLINE segment",
        "stackTrace": "java.io.EOFException\n\tat org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:316)\n\tat org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:634)\n\tat java.base/java.io.FilterInputStream.read(FilterInputStream.java:106)\n\tat org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1483)\n\tat org.apache.commons.io.IOUtils.copy(IOUtils.java:1107)\n\tat org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1456)\n\tat org.apache.commons.io.IOUtils.copy(IOUtils.java:1085)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untarWithRateLimiter(TarGzCompressionUtils.java:202)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untar(TarGzCompressionUtils.java:148)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untar(TarGzCompressionUtils.java:138)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.untarSegment(BaseTableDataManager.java:835)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:783)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegment(BaseTableDataManager.java:730)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadAndLoadSegment(BaseTableDataManager.java:389)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.addNewOnlineSegment(BaseTableDataManager.java:360)\n\tat org.apache.pinot.core.data.manager.offline.OfflineTableDataManager.doAddOnlineSegment(OfflineTableDataManager.java:54)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.addOnlineSegment(BaseTableDataManager.java:313)\n\tat org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOnlineSegment(HelixInstanceDataManager.java:275)\n\tat org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:131)\n\tat jdk.internal.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:569)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n"
      }
    }
  }
}

Tried to reload the segment from the rest API which is loading the segment further because of the segment registration not happened yet and _segmentDataManagerMap doesn't have entry for the segment.

2024/11/30 00:21:11.702 WARN [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_54] Failed to get segment data manager for segments: [<segment_name>] of table: org.apache.pinot.core.data.manager.offline.OfflineTableDataManager@52a09a91, skipping reloading them

New segment addition flow

  public void downloadAndLoadSegment(SegmentZKMetadata zkMetadata, IndexLoadingConfig indexLoadingConfig)
      throws Exception {
    String segmentName = zkMetadata.getSegmentName();
    _logger.info("Downloading and loading segment: {}", segmentName);
    File indexDir = downloadSegment(zkMetadata);
    addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig));
    _logger.info("Downloaded and loaded segment: {} with CRC: {} on tier: {}", segmentName, zkMetadata.getCrc(),
        TierConfigUtils.normalizeTierName(zkMetadata.getTier()));
  }
chrajeshbabu commented 3 days ago

@xiangfu0 @Jackie-Jiang would be better to add new segment as new segment when segment data manager could not be able to have it during the reload?