NVIDIA / spark-rapids-tools

User tools for Spark RAPIDS
Apache License 2.0
50 stars 37 forks source link

[BUG] Exception running on Spark 3.1.x and 3.2.x due to different constructors of StageInfo #1260

Open amahussein opened 2 months ago

amahussein commented 2 months ago

Describe the bug

This is breaking the CI/CD

ARNING: An illegal reflective access operation has occurred
02:08:54  WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/workspace/spark-qualification-tool/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
02:08:54  WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
02:08:54  WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
02:08:54  WARNING: All illegal access operations will be denied in a future release
02:08:54  24/08/05 07:08:54 ERROR Qualification: Error occurred while processing file: s3a://mydirectory/qualification_testing/app-xyz-0123
02:08:54  java.lang.NoSuchMethodError: 'boolean org.apache.spark.scheduler.StageInfo$.$lessinit$greater$default$12()'
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModel.initStageInfo(StageModel.scala:44)
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModel.org$apache$spark$sql$rapids$tool$store$StageModel$$updateInfo(StageModel.scala:70)
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModel.<init>(StageModel.scala:35)
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModel$.apply(StageModel.scala:127)
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModelManager.getOrCreateStage(StageModelManager.scala:75)
02:08:54    at org.apache.spark.sql.rapids.tool.store.StageModelManager.addStageInfo(StageModelManager.scala:122)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.getOrCreateStage(AppBase.scala:169)
02:08:54    at org.apache.spark.sql.rapids.tool.EventProcessorBase.doSparkListenerStageSubmitted(EventProcessorBase.scala:455)
02:08:54    at org.apache.spark.sql.rapids.tool.EventProcessorBase.processAnyEvent(EventProcessorBase.scala:80)
02:08:54    at org.apache.spark.sql.rapids.tool.qualification.QualificationAppInfo.processEvent(QualificationAppInfo.scala:87)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.$anonfun$processEventsInternal$6(AppBase.scala:268)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.$anonfun$processEventsInternal$6$adapted(AppBase.scala:263)
02:08:54    at scala.collection.Iterator.find(Iterator.scala:993)
02:08:54    at scala.collection.Iterator.find$(Iterator.scala:990)
02:08:54    at scala.collection.AbstractIterator.find(Iterator.scala:1429)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.$anonfun$processEventsInternal$5(AppBase.scala:263)
02:08:54    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.$anonfun$processEventsInternal$3(AppBase.scala:262)
02:08:54    at scala.collection.immutable.List.foreach(List.scala:392)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.processEventsInternal(AppBase.scala:261)
02:08:54    at org.apache.spark.sql.rapids.tool.AppBase.processEvents(AppBase.scala:405)
02:08:54    at org.apache.spark.sql.rapids.tool.qualification.QualificationAppInfo.<init>(QualificationAppInfo.scala:84)
02:08:54    at org.apache.spark.sql.rapids.tool.qualification.QualificationAppInfo$.createApp(QualificationAppInfo.scala:1101)
02:08:54    at com.nvidia.spark.rapids.tool.qualification.Qualification.com$nvidia$spark$rapids$tool$qualification$Qualification$$qualifyApp(Qualification.scala:148)
02:08:54    at com.nvidia.spark.rapids.tool.qualification.Qualification$QualifyThread.run(Qualification.scala:49)
02:08:54    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
02:08:54    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
02:08:54    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
02:08:54    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
02:08:54    at java.base/java.lang.Thread.run(Thread.java:829)
02:08:54  24/08/05 07:08:54 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
02:08:54  24/08/05 07:08:54 INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
02:08:54  24/08/05 07:08:54 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
### Tasks
- [ ] https://github.com/NVIDIA/spark-rapids-tools/pull/1261
- [ ] Use implicits to reduce the memory footprints of stageInfo
amahussein commented 2 months ago

The jar on mvn is built using spark-3.5 but the runtime would be an old version. @parthosa FYI, we might need to consider this case in our testing.