NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs
https://nvidia.github.io/spark-rapids
Apache License 2.0
783 stars 228 forks source link

[BUG] Clean up unused and duplicated 'org/roaringbitmap' folder in the spark3xx shims #11175

Closed NvTimLiu closed 1 month ago

NvTimLiu commented 1 month ago

Describe the bug Should clean up unused and duplicated 'spark3xx/META-INF/versions/11/org/roaringbitmap/' folder in the spark3xx shims, we've had the same in ./spark-shared/com/nvidia/shaded/spark/org/roaringbitmap/ folder in the dist jar

./spark324/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark321/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark332db/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark330db/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark340/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark321cdh/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark320/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark332cdh/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark332/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark333/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark330cdh/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark342/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark330/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark334/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark341db/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark331/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark323/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark341/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark343/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark322/META-INF/versions/11/org/roaringbitmap/ArraysShim.class
./spark-shared/com/nvidia/shaded/spark/org/roaringbitmap/ArraysShim.class

Besides, the duplicated files in roaringbitmap/ will cause errors like jacoco different class with same name : https://github.com/NVIDIA/spark-rapids/blob/branch-24.08/jenkins/Jenkinsfile-blossom.premerge#L196-L200

13:05:04  java.lang.IllegalStateException: Can't add different class with same name: com/nvidia/shaded/spark/org/roaringbitmap/ArraysShim
13:05:04    at org.jacoco.core.analysis.CoverageBuilder.visitCoverage(CoverageBuilder.java:106)
13:05:04    at org.jacoco.core.analysis.Analyzer$1.visitEnd(Analyzer.java:99)
13:05:04    at org.objectweb.asm.ClassVisitor.visitEnd(ClassVisitor.java:377)
13:05:04    at org.jacoco.core.internal.flow.ClassProbesAdapter.visitEnd(ClassProbesAdapter.java:100)
13:05:04    at org.objectweb.asm.ClassReader.accept(ClassReader.java:725)
13:05:04    at org.objectweb.asm.ClassReader.accept(ClassReader.java:401)
13:05:04    at org.jacoco.core.analysis.Analyzer.analyzeClass(Analyzer.java:116)
13:05:04    at org.jacoco.core.analysis.Analyzer.analyzeClass(Analyzer.java:132)
13:05:04  Caused: java.io.IOException: Error while analyzing /var/jenkins/jobs/rapids_premerge-github/builds/9737/jacoco/classes/org/roaringbitmap/ArraysShim.class.
13:05:04    at org.jacoco.core.analysis.Analyzer.analyzerError(Analyzer.java:162)
pxLi commented 1 month ago

cc @liurenjie1024 to help thanks

gerashegalov commented 1 month ago

This looks like the first occurrence of a multi-release jar among our dependencies. binary-dedupe.sh does not handle it yet.