NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
787 stars 228 forks source link

[BUG] Integration tests failing in join_test.py #11485

Open nartal1 opened 10 hours ago

nartal1 commented 10 hours ago

Nightly integration tests (Left semi join tests) are failing in join_test.py due to mismatch between CPU and GPU results:


[2024-09-19T13:45:37.529Z] E           AssertionError: CPU and GPU list have different lengths at [] CPU: 260 GPU: 28

[2024-09-19T13:45:37.529Z] 

[2024-09-19T13:45:37.529Z] ../../src/main/python/asserts.py:41: AssertionError

[2024-09-19T13:45:37.529Z] ---------------------------- Captured stderr setup -----------------------------

[2024-09-19T13:45:37.529Z] 2024-09-19 13:00:13 INFO     Running test 'src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})]'

[2024-09-19T13:45:37.529Z] ------------------------------ Captured log setup ------------------------------

[2024-09-19T13:45:37.529Z] INFO     __pytest_worker_logger__:spark_init_internal.py:256 Running test 'src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})]'

[2024-09-19T13:45:37.529Z] ----------------------------- 


All the failing tests:
[2024-09-19T13:45:37.543Z] =========================== short test summary info ============================

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Boolean][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 457 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Byte][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 298 GPU: 87

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Short][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 239 GPU: 68

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Integer][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 236 GPU: 66

[2024-09-19T13:45:37.543Z] Starting with datagen test seed: 1726749170 (Automatically set). Set env variable DATAGEN_SEED to override.

[2024-09-19T13:45:37.543Z] Starting with OOM injection seed: 1726749170. Set env variable SPARK_RAPIDS_TEST_INJECT_OOM_SEED to override.

[2024-09-19T13:45:37.543Z] 2024-09-19 12:32:51 INFO     Executing global initialization tasks before test launches

[2024-09-19T13:45:37.543Z] 2024-09-19 12:32:51 INFO     Creating directory /home/ubuntu/spark-rapids/integration_tests/target/run_dir-20240919123250-U9Li/hive with permissions 0o777

[2024-09-19T13:45:37.543Z] 2024-09-19 12:32:51 INFO     Skipping findspark init because on xdist master

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Long][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 235 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Date][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 253 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 240 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftSemi-String][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 230 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Boolean][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 43 GPU: 239

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Byte][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 202 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Short][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 261 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Integer][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 264 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Long][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 265 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Date][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 247 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 260 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_broadcast_join_with_conditionals[LeftAnti-String][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 270 GPU:...

[2024-09-19T13:45:37.543Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Boolean][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 457 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Byte][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 298 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Short][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 239 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Integer][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 236 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Long][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 235 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Date][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 253 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 240 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftSemi-String][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 230 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Boolean][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 43 GPU: 296

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Byte][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 202 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Short][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 261 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Integer][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 264 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Long][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 265 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Date][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 247 GPU:...

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-Timestamp][DATAGEN_SEED=1726749170, TZ=UTC, INJECT_OOM, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 260 GPU: 28

[2024-09-19T13:45:37.544Z] FAILED ../../src/main/python/join_test.py::test_sortmerge_join_with_condition_ast[LeftAnti-String][DATAGEN_SEED=1726749170, TZ=UTC, IGNORE_ORDER({'local': True})] - AssertionError: CPU and GPU list have different lengths at [] CPU: 270 GPU:...

[2024-09-19T13:45:37.544Z] = 32 failed, 30698 passed, 1099 skipped, 1091 xfailed, 723 xpassed, 14293 warnings in 4366.24s (1:12:46) =

abellina commented 10 hours ago

I believe this is because of https://github.com/rapidsai/cudf/pull/16230.

sameerz commented 3 hours ago

Waiting for a revert PR https://github.com/rapidsai/cudf/pull/16855 for a short term fix.

pxLi commented 2 hours ago

https://github.com/rapidsai/cudf/pull/16855 is merged. Triggering submodule syncup, and will re-build JNI later