apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.
https://uniffle.apache.org/
Apache License 2.0
388 stars 149 forks source link

[Flaky Test] org.apache.uniffle.test.RepartitionWithHadoopHybridStorageRssTest #2178

Closed maobaolong closed 1 month ago

maobaolong commented 1 month ago

Code of Conduct

Search before asking

Describe the flaky test

Error:  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.57 s <<< FAILURE! - in org.apache.uniffle.test.RepartitionWithHadoopHybridStorageRssTest
Error:  resultCompareTest  Time elapsed: 30.338 s  <<< ERROR!
org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 2 in stage 2.0 failed 1 times, most recent failure: Lost task 2.0 in stage 2.0 (TID 11, localhost, executor driver): org.apache.uniffle.common.exception.RssFetchFailedException: Failed to read shuffle data from HOT handler
    at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:124)
    at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:308)
    at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:216)
    at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:116)
    at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153)
    at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:50)
    at org.apache.spark.shuffle.reader.RssShuffleReader.read(RssShuffleReader.java:228)
    at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
    at io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:83)
    at io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:148)
    at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
    at org.apache.uniffle.common.netty.buffer.NettyManagedBuffer.release(NettyManagedBuffer.java:59)
    at org.apache.uniffle.common.ShuffleIndexResult.release(ShuffleIndexResult.java:77)
    at org.apache.uniffle.storage.handler.impl.DataSkippableReadHandler.readShuffleData(DataSkippableReadHandler.java:82)
    at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:113)
    ... 21 more

Actions URL

https://github.com/apache/incubator-uniffle/actions/runs/11327963829/job/31500140879?pr=2176

Are you willing to submit PR?

rickyma commented 1 month ago

Duplicated: https://github.com/apache/incubator-uniffle/issues/1628

maobaolong commented 1 month ago

@rickyma Thanks you for your remind, close this as this is duplicated.