apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.17k stars 428 forks source link

[VL] Native write to hdfs error : UnsupportedOperationException #7441

Open wenfang6 opened 2 days ago

wenfang6 commented 2 days ago

Backend

VL (Velox)

Bug description

run sql : insert overwrite table xx partition (ds = 'xx') select * from xx . There is an error message:

org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:500)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:321)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:131)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException
    at org.apache.spark.sql.execution.datasources.FakeRow.isNullAt(FakeRow.scala:36)
    at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:154)
    at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
    at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
    at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:304)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1524)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311)
    ... 9 more

table format is parquet . I would like to know if native write is currently supported. for Insertintohivetable.

Spark version

Spark-3.2.x

Spark configurations

spark.gluten.sql.native.writer.enabled=true

System information

No response

Relevant logs

No response

JkSelf commented 2 days ago

@wenfang6 We insert a fake row to support native write, but it falls back to the vanilla Spark writer here. It seems that isNativeApplicable is not set correctly. Does your code include this patch? And can you help to provide the reproduced sql? Thanks.

wenfang6 commented 2 days ago

@wenfang6 We insert a fake row to support native write, but it falls back to the vanilla Spark writer here. It seems that isNativeApplicable is not set correctly. Does your code include this patch? And can you help to provide the reproduced sql? Thanks.

simple sql also has this error, like :

insert overwrite  table wen_test_par1 partition (ds = '2024-10-09') 
select * from wen_test;

gluten plan :

== Fallback Summary ==
No fallback nodes

== Physical Plan ==
Execute InsertIntoHiveTable (4)
+- FakeRowAdaptor (3)
   +- ^ NativeScan hive dap_dev.wen_test (1)

we use spark 3.2.1

JkSelf commented 1 day ago

@wenfang6 Gluten native writer in spark 321 overwrite vanilla spark HiveFileFormat class. Therefore, you must ensure that the gluten jar is loaded prior to the vanilla spark jar. You can refer the this document to configure. Thanks.

wenfang6 commented 1 day ago

@wenfang6 Gluten native writer in spark 321 overwrite vanilla spark HiveFileFormat class. Therefore, you must ensure that the gluten jar is loaded prior to the vanilla spark jar. You can refer the this document to configure. Thanks.

l try it, but Still haven't use native write. plan like this

== Fallback Summary ==
No fallback nodes

== Physical Plan ==
CommandResult (1)
   +- Execute InsertIntoHiveTable (5)
      +- VeloxColumnarToRowExec (4)
         +- ^ NativeScan hive dap_dev.wen_test (2)
JkSelf commented 1 day ago

@wenfang6 Does the above issue is fixed based on this document ? Also native write doesn't support complex type. Does your sql contain complex type?

wenfang6 commented 1 day ago

@wenfang6 Does the above issue is fixed based on this document ? Also native write doesn't support complex type. Does your sql contain complex type?

yeah, the above issue is fixed. but haven't use native write. sql doesn't contain complex type.

JkSelf commented 1 day ago

@wenfang6 Does this config spark.gluten.sql.native.writer.enabled enabled in your env? The default value is false.

wenfang6 commented 1 day ago

@wenfang6 Does this config spark.gluten.sql.native.writer.enabled enabled in your env? The default value is false.

l set the conf spark.gluten.sql.native.hive.writer.enabled=true

JkSelf commented 17 hours ago

@wenfang6 Can you add some logging info here to determine why this line is not being executed?