StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.2k stars 1.83k forks source link

FE OOM error cause com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeHMSTableScanNode does not perform predicate pushdown. #49911

Open yohengyang opened 3 months ago

yohengyang commented 3 months ago

FE OOM Describe

When using StarRocks 3.3.0 to query a Hive table with a total of 500,000 partitions and 200 columns, the FE node experiences an OOM error, even if the query conditions limit the partitions. sql like select * from big_hive_table where dt='2024-08-12' limit 10;

Here are the dump analysis details: image image

stackTrace:

starrocks-mysql-nio-pool-41
  at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardSchemeFactory.getScheme()Lorg/apache/hadoop/hive/metastore/api/FieldSchema$FieldSchemaStandardScheme; (FieldSchema.java:476)
  at org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardSchemeFactory.getScheme()Lorg/apache/thrift/scheme/IScheme; (FieldSchema.java:474)
  at org.apache.hadoop.hive.metastore.api.FieldSchema.read(Lorg/apache/thrift/protocol/TProtocol;)V (FieldSchema.java:414)
  at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V (StorageDescriptor.java:1299)
  at org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/thrift/TBase;)V (StorageDescriptor.java:1278)
  at org.apache.hadoop.hive.metastore.api.StorageDescriptor.read(Lorg/apache/thrift/protocol/TProtocol;)V (StorageDescriptor.java:1140)
  at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/hadoop/hive/metastore/api/Partition;)V (Partition.java:1143)
  at org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/thrift/TBase;)V (Partition.java:1078)
  at org.apache.hadoop.hive.metastore.api.Partition.read(Lorg/apache/thrift/protocol/TProtocol;)V (Partition.java:954)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_names_result$get_partitions_by_names_resultStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/hadoop/hive/metastore/api/ThriftHiveMetastore$get_partitions_by_names_result;)V (Unknown Source)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_names_result$get_partitions_by_names_resultStandardScheme.read(Lorg/apache/thrift/protocol/TProtocol;Lorg/apache/thrift/TBase;)V (Unknown Source)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_by_names_result.read(Lorg/apache/thrift/protocol/TProtocol;)V (Unknown Source)
  at org.apache.thrift.TServiceClient.receiveBase(Lorg/apache/thrift/TBase;Ljava/lang/String;)V (TServiceClient.java:93)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_names()Ljava/util/List; (ThriftHiveMetastore.java:3380)
  at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_names(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/List; (ThriftHiveMetastore.java:3365)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/List; (HiveMetaStoreClient.java:723)
  at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/List; (HiveMetaStoreClient.java:717)
  at jdk.internal.reflect.GeneratedMethodAccessor20.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (Method.java:566)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (RetryingMetaStoreClient.java:208)
  at com.sun.proxy.$Proxy38.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/List; (Unknown Source)
  at com.starrocks.connector.hive.HiveMetaClient.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/List; (HiveMetaClient.java:309)
  at com.starrocks.connector.hive.HiveMetastore.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/Map; (HiveMetastore.java:196)
  at com.starrocks.connector.hive.CachingHiveMetastore.loadPartitionsByNames(Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:327)
  at com.starrocks.connector.hive.CachingHiveMetastore$1.loadAll(Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:121)
  at com.google.common.cache.CacheLoader$1.loadAll(Ljava/lang/Iterable;)Ljava/util/Map; (CacheLoader.java:205)
  at com.google.common.cache.LocalCache.loadAll(Ljava/util/Set;Lcom/google/common/cache/CacheLoader;)Ljava/util/Map; (LocalCache.java:4118)
  at com.google.common.cache.LocalCache.getAll(Ljava/lang/Iterable;)Lcom/google/common/collect/ImmutableMap; (LocalCache.java:4081)
  at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(Ljava/lang/Iterable;)Lcom/google/common/collect/ImmutableMap; (LocalCache.java:5025)
  at com.starrocks.connector.hive.CachingHiveMetastore.getAll(Lcom/google/common/cache/LoadingCache;Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:596)
  at com.starrocks.connector.hive.CachingHiveMetastore.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/Map; (CachingHiveMetastore.java:315)
  at com.starrocks.connector.hive.CachingHiveMetastore.loadPartitionsByNames(Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:327)
  at com.starrocks.connector.hive.CachingHiveMetastore$1.loadAll(Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:121)
  at com.google.common.cache.CacheLoader$1.loadAll(Ljava/lang/Iterable;)Ljava/util/Map; (CacheLoader.java:205)
  at com.google.common.cache.LocalCache.loadAll(Ljava/util/Set;Lcom/google/common/cache/CacheLoader;)Ljava/util/Map; (LocalCache.java:4118)
  at com.google.common.cache.LocalCache.getAll(Ljava/lang/Iterable;)Lcom/google/common/collect/ImmutableMap; (LocalCache.java:4081)
  at com.google.common.cache.LocalCache$LocalLoadingCache.getAll(Ljava/lang/Iterable;)Lcom/google/common/collect/ImmutableMap; (LocalCache.java:5025)
  at com.starrocks.connector.hive.CachingHiveMetastore.getAll(Lcom/google/common/cache/LoadingCache;Ljava/lang/Iterable;)Ljava/util/Map; (CachingHiveMetastore.java:596)
  at com.starrocks.connector.hive.CachingHiveMetastore.getPartitionsByNames(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Ljava/util/Map; (CachingHiveMetastore.java:315)
  at com.starrocks.connector.hive.HiveMetastoreOperations.getPartitionByPartitionKeys(Lcom/starrocks/catalog/Table;Ljava/util/List;)Ljava/util/Map; (HiveMetastoreOperations.java:277)
  at com.starrocks.connector.hive.HiveStatisticsProvider.getEstimatedRowCount(Lcom/starrocks/catalog/Table;Ljava/util/List;)J (HiveStatisticsProvider.java:149)
  at com.starrocks.connector.hive.HiveStatisticsProvider.createUnknownStatistics(Lcom/starrocks/catalog/Table;Ljava/util/List;Ljava/util/List;D)Lcom/starrocks/sql/optimizer/statistics/Statistics; (HiveStatisticsProvider.java:186)
  at com.starrocks.connector.hive.HiveMetadata.getTableStatistics(Lcom/starrocks/sql/optimizer/OptimizerContext;Lcom/starrocks/catalog/Table;Ljava/util/Map;Ljava/util/List;Lcom/starrocks/sql/optimizer/operator/scalar/ScalarOperator;J)Lcom/starrocks/sql/optimizer/statistics/Statistics; (HiveMetadata.java:281)
  at com.starrocks.connector.CatalogConnectorMetadata.getTableStatistics(Lcom/starrocks/sql/optimizer/OptimizerContext;Lcom/starrocks/catalog/Table;Ljava/util/Map;Ljava/util/List;Lcom/starrocks/sql/optimizer/operator/scalar/ScalarOperator;J)Lcom/starrocks/sql/optimizer/statistics/Statistics; (CatalogConnectorMetadata.java:174)
  at com.starrocks.server.MetadataMgr.lambda$getTableStatistics$11(Lcom/starrocks/sql/optimizer/OptimizerContext;Lcom/starrocks/catalog/Table;Ljava/util/Map;Ljava/util/List;Lcom/starrocks/sql/optimizer/operator/scalar/ScalarOperator;JLcom/starrocks/connector/ConnectorMetadata;)Lcom/starrocks/sql/optimizer/statistics/Statistics; (MetadataMgr.java:713)
  at com.starrocks.server.MetadataMgr$$Lambda$924.apply(Ljava/lang/Object;)Ljava/lang/Object; (Unknown Source)
  at java.util.Optional.map(Ljava/util/function/Function;)Ljava/util/Optional; (Optional.java:265)
  at com.starrocks.server.MetadataMgr.getTableStatistics(Lcom/starrocks/sql/optimizer/OptimizerContext;Ljava/lang/String;Lcom/starrocks/catalog/Table;Ljava/util/Map;Ljava/util/List;Lcom/starrocks/sql/optimizer/operator/scalar/ScalarOperator;J)Lcom/starrocks/sql/optimizer/statistics/Statistics; (MetadataMgr.java:713)
  at com.starrocks.server.MetadataMgr.getTableStatistics(Lcom/starrocks/sql/optimizer/OptimizerContext;Ljava/lang/String;Lcom/starrocks/catalog/Table;Ljava/util/Map;Ljava/util/List;Lcom/starrocks/sql/optimizer/operator/scalar/ScalarOperator;)Lcom/starrocks/sql/optimizer/statistics/Statistics; (MetadataMgr.java:748)
  at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeHMSTableScanNode(Lcom/starrocks/sql/optimizer/operator/Operator;Lcom/starrocks/sql/optimizer/ExpressionContext;Lcom/starrocks/catalog/Table;Ljava/util/Map;)Ljava/lang/Void; (StatisticsCalculator.java:591)
  at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHiveScan(Lcom/starrocks/sql/optimizer/operator/logical/LogicalHiveScanOperator;Lcom/starrocks/sql/optimizer/ExpressionContext;)Ljava/lang/Void; (StatisticsCalculator.java:566)
  at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalHiveScan(Lcom/starrocks/sql/optimizer/operator/logical/LogicalHiveScanOperator;Ljava/lang/Object;)Ljava/lang/Object; (StatisticsCalculator.java:178)
  at com.starrocks.sql.optimizer.operator.logical.LogicalHiveScanOperator.accept(Lcom/starrocks/sql/optimizer/operator/OperatorVisitor;Ljava/lang/Object;)Ljava/lang/Object; (LogicalHiveScanOperator.java:78)
  at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.estimatorStats()V (StatisticsCalculator.java:194)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:810)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Utils.calculateStatistics(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/OptimizerContext;)V (Utils.java:799)
  at com.starrocks.sql.optimizer.Optimizer.skewJoinOptimize(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/task/TaskContext;)V (Optimizer.java:722)
  at com.starrocks.sql.optimizer.Optimizer.logicalRuleRewrite(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/task/TaskContext;)Lcom/starrocks/sql/optimizer/OptExpression; (Optimizer.java:505)
  at com.starrocks.sql.optimizer.Optimizer.rewriteAndValidatePlan(Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/task/TaskContext;)Lcom/starrocks/sql/optimizer/OptExpression; (Optimizer.java:676)
  at com.starrocks.sql.optimizer.Optimizer.optimizeByCost(Lcom/starrocks/qe/ConnectContext;Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/base/PhysicalPropertySet;Lcom/starrocks/sql/optimizer/base/ColumnRefSet;)Lcom/starrocks/sql/optimizer/OptExpression; (Optimizer.java:238)
  at com.starrocks.sql.optimizer.Optimizer.optimize(Lcom/starrocks/qe/ConnectContext;Lcom/starrocks/sql/optimizer/OptExpression;Ljava/util/Map;Lcom/starrocks/sql/ast/StatementBase;Lcom/starrocks/sql/optimizer/base/PhysicalPropertySet;Lcom/starrocks/sql/optimizer/base/ColumnRefSet;Lcom/starrocks/sql/optimizer/base/ColumnRefFactory;)Lcom/starrocks/sql/optimizer/OptExpression; (Optimizer.java:186)
  at com.starrocks.sql.optimizer.Optimizer.optimize(Lcom/starrocks/qe/ConnectContext;Lcom/starrocks/sql/optimizer/OptExpression;Lcom/starrocks/sql/optimizer/base/PhysicalPropertySet;Lcom/starrocks/sql/optimizer/base/ColumnRefSet;Lcom/starrocks/sql/optimizer/base/ColumnRefFactory;)Lcom/starrocks/sql/optimizer/OptExpression; (Optimizer.java:162)
  at com.starrocks.sql.InsertPlanner.buildExecPlan(Lcom/starrocks/sql/ast/InsertStmt;Lcom/starrocks/qe/ConnectContext;Ljava/util/List;Lcom/starrocks/sql/optimizer/transformer/LogicalPlan;Lcom/starrocks/sql/optimizer/base/ColumnRefFactory;Lcom/starrocks/sql/ast/QueryRelation;Lcom/starrocks/catalog/Table;)Lcom/starrocks/sql/plan/ExecPlan; (InsertPlanner.java:514)
  at com.starrocks.sql.InsertPlanner.plan(Lcom/starrocks/sql/ast/InsertStmt;Lcom/starrocks/qe/ConnectContext;)Lcom/starrocks/sql/plan/ExecPlan; (InsertPlanner.java:315)
  at com.starrocks.sql.StatementPlanner.planInsertStmt(Lcom/starrocks/sql/analyzer/PlannerMetaLocker;Lcom/starrocks/sql/ast/InsertStmt;Lcom/starrocks/qe/ConnectContext;)Lcom/starrocks/sql/plan/ExecPlan; (StatementPlanner.java:166)
  at com.starrocks.sql.StatementPlanner.plan(Lcom/starrocks/sql/ast/StatementBase;Lcom/starrocks/qe/ConnectContext;Lcom/starrocks/thrift/TResultSinkType;)Lcom/starrocks/sql/plan/ExecPlan; (StatementPlanner.java:140)
  at com.starrocks.sql.StatementPlanner.plan(Lcom/starrocks/sql/ast/StatementBase;Lcom/starrocks/qe/ConnectContext;)Lcom/starrocks/sql/plan/ExecPlan; (StatementPlanner.java:92)
  at com.starrocks.qe.StmtExecutor.handleCreateTableAsSelectStmt(J)V (StmtExecutor.java:865)
  at com.starrocks.qe.StmtExecutor.execute()V (StmtExecutor.java:664)
  at com.starrocks.qe.ConnectProcessor.handleQuery()V (ConnectProcessor.java:355)
  at com.starrocks.qe.ConnectProcessor.dispatch()V (ConnectProcessor.java:550)
  at com.starrocks.qe.ConnectProcessor.processOnce()V (ConnectProcessor.java:884)
  at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0()V (ReadListener.java:69)
  at com.starrocks.mysql.nio.ReadListener$$Lambda$624.run()V (Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1128)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:628)
  at java.lang.Thread.run()V (Thread.java:829)

Detailed Problem Description

Why ScanOperatorPredicates predicates is empty

Through debugging, it can be observed that although the query has partition constraints, the predicates remain empty. image

https://github.com/StarRocks/starrocks/blob/e703a61015ddfd9e5461710931c2e2084079ad31/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/statistics/StatisticsCalculator.java#L583-L619 Since predicates is null, the line: List<PartitionKey> partitionKeys = predicates.hasPrunedPartition() ? predicates.getSelectedPartitionKeys() : PartitionUtil.getPartitionKeys(table); retrieves all partition keys (partitionKeys). Consequently, in the subsequent call: com.starrocks.connector.hive.HiveStatisticsProvider.getEstimatedRowCountit queries all partitions from hiev metastore

I noticed that predicates is only assigned a value in one place, https://github.com/StarRocks/starrocks/blob/e703a61015ddfd9e5461710931c2e2084079ad31/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/rule/transformation/ExternalScanPartitionPruneRule.java#L72 ,but ruleRewriteIterative(tree, rootTaskContext, RuleSetType.PUSH_DOWN_PREDICATE) happens after the https://github.com/StarRocks/starrocks/blob/e703a61015ddfd9e5461710931c2e2084079ad31/fe/fe-core/src/main/java/com/starrocks/sql/optimizer/Optimizer.java#L549

a potential solution could be to adjust the order of skewJoinOptimize(tree, rootTaskContext) and ruleRewriteIterative(tree, rootTaskContext, RuleSetType.PUSH_DOWN_PREDICATE) ?

org.apache.hadoop.hive.metastore.api.FieldSchema seems unsless

https://github.com/StarRocks/starrocks/blob/a34d0ed18edad2e7f260f37243b261633ca0fd17/fe/fe-core/src/main/java/com/starrocks/connector/hive/HiveMetastoreApiConverter.java#L352-L366 Fetching org.apache.hadoop.hive.metastore.api.Partition from the metastore retrieves the schema for each partition. This is not utilized at all in StarRocks. Perhaps a new interface could be added to the metastore to only fetch the necessary information.

chenminghua8 commented 3 months ago

Can someone assign this task to me?

LiShuMing commented 3 months ago

@chenminghua8 It's fine if you have found the root cause and pull a request.