Open tobegit3hub opened 5 years ago
This seems to be an issue with LZ4 codec: https://issues.apache.org/jira/browse/SPARK-18105
Can you please try to set ( lzf OR snappy OR zstd) to spark.io.compression.code
Thanks @petro-rudenko . We have ty lz4 codec in this scenario and it works with SparkRDMA.
By the way, we got another error after shuffle read when compression was disable. So it is definitely not the issue of the compression implementation.
What's the error you got?
@petro-rudenko The error message for disable compression was in the description of this issue. The third code snippet of https://github.com/Mellanox/SparkRDMA/issues/34#issue-465161133 .
We have communicated with the exports of FPGA and they said that the initialization of each FPGA process will request and reset some memory which may affect the data of shuffle read. We have seen that the data was corrupt with many bytes were 0 which looks like a fresh memory.
I'm not sure if it is related yet. But the RDMA Shuffle Manager may check or print the data of shuffle read. If it is not corrupted in the internal of RDMA Shuffle Manager but corrupted in the Spark executor's memory which may be initialized by FPGA code, it could be the issue of the client-size memory management.
We have seem the official Spark shuffle manager has the steam check for each shuffle, is it possible to add the similiar code in RDMA shuffle manager? @petro-rudenko
@tobegit3hub yes, i saw this feature. We're in the process of migrating SparkRDMA to more performant network backend based on ucx library. There we woudn't reimplement default spark iterator, rather all logic would be in the ShuffleClient. I'll check how easy would be to add corruption checker here. cc @yosefe
Thanks @petro-rudenko . That may help and we can test for the new client in our environment.
We have setup RDMA environment and run the Spark jobs with RDMA Shuffle Manager. Here is the Spark command to submit the job.
This script may work at times. But sometimes it may cause the fail task although the task may re-schedule and complete finally. Here is the error log of the fail task. Since we have set
spark.shuffle.compress=true
and the default compressor Lz4Codec will throw exception ofStream is corrupted
.If we set
spark.shuffle.compress=false
, the error will be throw byreadFully
when try to unserialized the steam to row object.The SparkApp is simple. We read the data from HDFS and do some transformation before calling
saveAsTextFile
to save the data in HDFS. The fail task always happen in the final stage.By the way, this issue may be not easy to reproduce. But in our environment, we have use the custom codec to save the RDD which may raise the probability about this issue. The codec can not get the stream or break the stream before RDMA Shuffle Manager doing the shuffle read, so we still have no clue about the root cause of this issue.