Angel-ML / PyTorch-On-Angel

PyTorch On Angel, arming PyTorch with a powerful Parameter Server, which enable PyTorch to train very big models.
164 stars 51 forks source link

pytorch on angel跑在yarn上报错User class threw exception: java.lang.UnsatisfiedLinkError: /data/hadoop/yarn/local/usercache/hdfs/filecache/5907/libtorch_angel.so: libtorchscatter.so #133

Closed FreshOne-qx closed 4 months ago

FreshOne-qx commented 4 months ago

pytorch on angel分支0.3.0 angel版本3.2.0 提交参数 ${SPARK_HOME}/bin/spark-submit \ --master yarn \ --deploy-mode cluster \ --queue default \ --conf spark.ps.instances=1 \ --conf spark.ps.cores=1 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.memory=4g \ --conf spark.ps.log.level=INFO \ --conf spark.driver.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:. \ --conf spark.executor.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:. \ --conf spark.executor.extraLibraryPath=. \ --conf spark.driver.extraLibraryPath=. \ --conf spark.executorEnv.OMP_NUM_THREADS=2 \ --conf spark.executorEnv.MKL_NUM_THREADS=2 \ --name "deepfm for torch on angel" \ --jars $SONA_SPARK_JARS \ --files deepfm.pt,$torchlib \ --driver-memory 4g \ --num-executors 1 \ --executor-cores 1 \ --executor-memory 4g \ --class com.tencent.angel.pytorch.examples.supervised.RecommendationExample pytorch-on-angel-0.3.0.jar \ trainInput:$input batchSize:128 torchModelPath:deepfm.pt \ stepSize:0.001 numEpoch:10 testRatio:0.1 \ angelModelOutputPath:$output

wangcaihua commented 4 months ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

FreshOne-qx commented 4 months ago

![Uploading image.png…]()