Closed arunkindra closed 3 years ago
The backend (Storage Read API) service does not support reading from external tables at this time.
Hi @emkornfield Is there any plan to add this support in near future?
It is something we are considering on our roadmap. To help track demand for it you can open a feature request in the BQ issue tracker
Duplicate of #255
Hello @davidrabinowitz, facing a different error while trying to query an External BQ table, is there any future plan to support the same?
Caused by: java.lang.UnsupportedOperationException at com.google.cloud.spark.bigquery.ArrowSchemaConverter$ArrowVectorAccessor.getUTF8String(ArrowSchemaConverter.java:313) at com.google.cloud.spark.bigquery.ArrowSchemaConverter.getUTF8String(ArrowSchemaConverter.java:120) at org.apache.spark.sql.execution.vectorized.MutableColumnarRow.getUTF8String(MutableColumnarRow.java:135) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$2.hasNext(WholeStageCodegenExec.scala:636) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:414) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Is the table part of a BigLake?
No David, this is a regular BQ External table over ORC files on GCS. Just curious to know the support for the same, we'd probably read the ORC directly or through a hive external table with Spark.
Btw, thanks for the mention of BigLake, excited to see google heading to lakehouse!
The API only has support for BigLake tables at this point, supporting normal external tables is not something that we will likely immediately support.
I am trying to read an external table using this connector, and I am getting below issue. May I know if there is any plan to support this in near future?
Dependency used