NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
822 stars 235 forks source link

[BUG] No automated tests for REPLs (pyspark, spark-shell, notebooks) #5704

Open gerashegalov opened 2 years ago

gerashegalov commented 2 years ago

Describe the bug Our codebase contains classloading-sensitve code such as

Classloader architecture in REPLs is different and much more complicated than in batch spark-submitted Spark apps.

REPL's such as jupyter and Databricks notebooks are tested late in the dev-cycle manually. Bugs are detected too late into the release #3760.

We need to shift-left detection of breaking changes by automating manual notebook/REPL tests.

Steps/Code to reproduce bug Various

Expected behavior Catch bugs in REPLs and Notebooks no later than nightly tests

Environment details (please complete the following information) Databricks, local REPL

Additional context

5646

### Tasks
- [ ] https://github.com/NVIDIA/spark-rapids/pull/9504
- [ ] databricks notebook tests
- [ ] spark-sql cli tests
GaryShen2008 commented 2 years ago

We have some notebooks testing on Databricks almost every day. If there's no requirement for a special test case, I think notebook should have been covered. @gerashegalov Which test cases do we want to run by pyspark, spark-shell?

gerashegalov commented 2 years ago

Thanks @GaryShen2008 ! This is great! We need to do some forensics with @tgravescs to dig out the notebook that was responsible for #3760 to include this test case into our daily notebook testing.

tgravescs commented 2 years ago

I don't know what notebook showed this so we would have to go back to try to reproduce