Qbeast-io / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
https://qbeast.io/qbeast-our-tech/
Apache License 2.0
210 stars 19 forks source link

Error when using CREATE EXTERNAL TABLE with Relative Path #392

Open osopardo1 opened 3 weeks ago

osopardo1 commented 3 weeks ago

What went wrong?

When trying to create an external table using a relative path, the code throws an error.

How to reproduce?

Different steps about how to reproduce the problem.

1. Code that triggered the bug, or steps to reproduce:

Seq(1, 2, 3).toDF("id").write.format("qbeast").option("columnsToIndex", "id").mode("overwrite").save("test_qbeast")
spark.read.format("qbeast").load("test_qbeast").show()

spark.sql("CREATE EXTERNAL TABLE t3(id INT) USING qbeast LOCATION './test_qbeast'")

org.apache.spark.sql.AnalysisException: Auto indexing is disabled. Please specify the columns to index in a comma separated way as .option(columnsToIndex, ...) or enable auto indexing with spark.qbeast.index.autoIndexingEnabled=true

2. Branch and commit id:

main at a01bd15

3. Spark version:

On the spark shell run spark.version.

3.5.0

4. Hadoop version:

On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.3.4

5. How are you running Spark?

Local

6. Stack trace:

Trace of the log/error messages.

org.apache.spark.sql.AnalysisException: Auto indexing is disabled. Please specify the columns to index in a comma separated way as .option(columnsToIndex, ...) or enable auto indexing with spark.qbeast.index.autoIndexingEnabled=true
  at org.apache.spark.sql.AnalysisExceptionFactory$.create(AnalysisExceptionFactory.scala:55)
  at io.qbeast.spark.table.IndexedTableImpl.selectColumnsToIndex(IndexedTable.scala:304)
  at io.qbeast.spark.internal.sources.catalog.QbeastCatalogUtils$.createQbeastTable(QbeastCatalogUtils.scala:264)
  at io.qbeast.spark.internal.sources.catalog.QbeastCatalog.createTable(QbeastCatalog.scala:137)
  at io.qbeast.spark.internal.sources.catalog.QbeastCatalog.createTable(QbeastCatalog.scala:118)
  at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:44)
  at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
  at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
  at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
  at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
  at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
  ... 47 elided
osopardo1 commented 3 weeks ago

Not reproducible from the tests, unfortunately. Will investigate if it's a problem of versioning.