Closed Akshay-Venkatesh closed 5 years ago
So you need to put libdisni.so
from release tarball to some directory on all spark executors. Is this directory is not in java.library.path
on every Spark Master and Worker (usually in /usr/lib) - then you need to add spark configuration:
spark.executor.extraJavaOptions -Djava.library.path=/hpc/scrap/users/swat/jenkins/disni/
spark.driver.extraJavaOptions -Djava.library.path=/hpc/scrap/users/swat/jenkins/disni/
Thanks a lot! Your latter suggestion helped.
Sorry reopneing this because there maybe a related issue. I see this at the end of the run:
2019-02-28 15:25:49 ERROR RdmaNode:384 - Failed to stop RdmaChannel during 50 ms
This is OK, just to save time at the job end, it forcefully stops an RDMA channel.
I'm seeing the error below when running a spark on 2-nodes (1 master and 2 workers). I'm not a frequent user of Java but any thoughts on why I'd be seeing an initialization error here?