Angel-ML / PyTorch-On-Angel

PyTorch On Angel, arming PyTorch with a powerful Parameter Server, which enable PyTorch to train very big models.
164 stars 51 forks source link

使用新版本跑示例程序出错 #102

Closed yinhang-e5b0b9e888aa closed 3 years ago

yinhang-e5b0b9e888aa commented 3 years ago

之前用0.2版本没有这个错误,不知道是因为更新了什么导致的

at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD.count(RDD.scala:1168) at com.tencent.angel.pytorch.graph.gcn.GCN.makeGraph(GCN.scala:60) at com.tencent.angel.pytorch.graph.gcn.GNN.initialize(GNN.scala:99) at com.tencent.angel.pytorch.examples.supervised.cluster.GraphSageExample$.main(GraphSageExample.scala:150) at com.tencent.angel.pytorch.examples.supervised.cluster.GraphSageExample.main(GraphSageExample.scala) Caused by: com.tencent.angel.exception.AngelException: com.tencent.angel.exception.AngelException: node id is not in range [0, 10 at com.tencent.angel.psagent.matrix.MatrixClientImpl.get(MatrixClientImpl.java:732) at com.tencent.angel.spark.models.impl.PSVectorImpl.psfGet(PSVectorImpl.scala:78) at com.tencent.angel.pytorch.graph.gcn.GNNPSModel.readLabels2(GNNPSModel.scala:71) at com.tencent.angel.pytorch.graph.gcn.GraphAdjPartition.splitTrainTest(GraphPartition.scala:62) at com.tencent.angel.pytorch.graph.gcn.GraphAdjPartition.toSemiGCNPartition(GraphPartition.scala:49) at com.tencent.angel.pytorch.graph.gcn.GCN$$anonfun$4.apply(GCN.scala:54) at com.tencent.angel.pytorch.graph.gcn.GCN$$anonfun$4.apply(GCN.scala:54) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)

ouyangwen-it commented 3 years ago

angel用的是哪个版本

yinhang-e5b0b9e888aa commented 3 years ago

angel用的是哪个版本

3.2.0 release

rachelsunrh commented 3 years ago

这个是全部的错误日志了吗, 发一下完整错误日志吧

rachelsunrh commented 3 years ago

用最新的angel工程master分支的代码

yinhang-e5b0b9e888aa commented 3 years ago

用angel的master分支的代码可以跑过了