NVIDIA / spark-rapids-examples

A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
Apache License 2.0
118 stars 50 forks source link

spark-rapids-examples nightly failed micro-benchmarks-gpu.ipynb #411

Closed pxLi closed 1 month ago

pxLi commented 1 month ago

Describe the bug after https://github.com/NVIDIA/spark-rapids-examples/pull/409

we started seeing nightly test error examples_GitHub-notebook run: 826,827,

[2024-07-24T09:26:53.359Z] 24/07/24 09:26:52 INFO TorrentBroadcast: Reading broadcast variable 77 took 2 ms
[2024-07-24T09:26:53.359Z] 24/07/24 09:26:52 INFO MemoryStore: Block broadcast_77 stored as values in memory (estimated size 54.4 KiB, free 16.9 GiB)
[2024-07-24T09:26:53.359Z] 24/07/24 09:26:52 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 14, fetching them
[2024-07-24T09:26:53.359Z] 24/07/24 09:26:52 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@examples-rockylinux8-cuda112-827-t9vbf-t0d94:45927)
[2024-07-24T09:26:53.359Z] 24/07/24 09:26:52 INFO MapOutputTrackerWorker: Got the map output locations
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:52 INFO ShuffleBlockFetcherIterator: Getting 200 (15.6 KiB) non-empty blocks including 200 (15.6 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:52 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:52 INFO Executor: Finished task 0.0 in stage 63.0 (TID 840). 7435 bytes result sent to driver
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:52 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:53 INFO RapidsBufferCatalog: Closing storage
[2024-07-24T09:26:53.360Z] 24/07/24 09:26:53 ERROR CoarseGrainedExecutorBackend: RECEIV+ exit -2

Steps/Code to reproduce bug Please provide a list of steps or a code sample to reproduce the issue. Avoid posting private or sensitive data.

Expected behavior Pass the test

Environment details (please complete the following information)

pxLi commented 1 month ago

cc @nvliyuan

NvTimLiu commented 1 month ago

Missed files, need to update: 1, change path to (dataRoot + \"/tpcds/store_sales\") (dataRoot + \"/tpcds/store_returns\") 2, upload the the data /tpcds/store_returns for CI into : https://github.com/NVIDIA/spark-rapids-examples/blob/main/datasets/tpcds-small.tar.gz

[2024-07-24T09:26:53.035Z] [0;31mAnalysisException[0m: Path does not exist: file:/home/jenkins/agent/workspace/examples_GitHub-notebook/notebook-examples/datasets/store_sales

[2024-07-24T11:48:29.190Z] [0;31mAnalysisException[0m: Path does not exist: file:/home/jenkins/agent/workspace/examples_GitHub-notebook/notebook-examples/datasets/tpcds/store_returns

https://github.com/NVIDIA/spark-rapids-examples/pull/409/

image

GaryShen2008 commented 1 month ago

Hi @nvliyuan, can we close it after merging your PR?

nvliyuan commented 1 month ago

yes, closed