NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
824 stars 236 forks source link

[BUG] databricks orc test_read_nested_pruning hanging #11547

Open abellina opened 1 month ago

abellina commented 1 month ago

We have seen some CI jobs failing over the weekend with this issue:

08:34:04  ../../src/main/python/orc_test.py::test_read_nested_pruning[false-orc-{'spark.rapids.sql.format.orc.reader.type': 'MULTITHREADED', 'spark.rapids.sql.reader.multithreaded.combine.sizeBytes': '64m', 'spark.rapids.sql.reader.multithreaded.read.keepOrder': True, 'spark.rapids.sql.reader.chunked': True, 'spark.rapids.sql.reader.chunked.limitMemoryUsage': False}-[['ar', Array(Struct(['str_1', String],['str_2', String]))]]-[['ar', Array(Struct(['str_2', String]))]]][DATAGEN_SEED=1727612435, TZ=UTC, INJECT_OOM] client_loop: send disconnect: Broken pipe
08:36:25  ssh: connect to host 54.191.207.123 port 2200: Connection timed out

We should investigate. This is with 24.10 snapshot.

abellina commented 1 month ago

I'll take a look at reproing this

abellina commented 1 month ago

Note I can't repro this locally (RTX5000) with the provided datagen seed against spark 3.3.0. I will try it in databricks as well (A10)

abellina commented 1 month ago

I have been able to run this against databricks 12.2 on an A10. I cannot repro the issue by running:

export DATAGEN_SEED=1727612435
./jenkins/databricks/test.sh

And modifying test.sh so it only runs -k test_read_nested_pruning

==== 540 passed, 55 warnings in 174.20s (0:02:54) ====