Angel-ML / PyTorch-On-Angel

PyTorch On Angel, arming PyTorch with a powerful Parameter Server, which enable PyTorch to train very big models.
165 stars 51 forks source link

DGI训练报错,使用官方文档中参数训练的时候出错 #23

Open xiaopqr opened 3 years ago

xiaopqr commented 3 years ago

20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_3 locally 20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_0 locally 20/11/12 15:21:17 INFO BlockManager: Found block rdd_27_1 locally 20/11/12 15:21:17 INFO BlockManager: Found block rdd_272 locally terminate called after throwing an instance of 'c10::Error' what(): forward() is missing value for argument 'second_edgeindex'. Declaration: forward(ClassType self, Tensor pos_x, Tensor neg_x, Tensor first_edge_index, Tensor second_edge_index) -> ((Tensor, Tensor, Tensor)) (checkAndNormalizeInputs at /pytorch/aten/src/ATen/core/function_schema_inl.h:270) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f4953ac9813 in libc10.so) frame #1: + 0x323bdea (0x7f4956f29dea in libtorch.so) frame #2: torch::jit::Function::operator()(std::vector<c10::IValue, std::allocator >, std::unordered_map<std::string, c10::IValue, std::hash, std::equal_to, std::allocator<std::pair<std::string const, c10::IValue> > > const&) + 0x36 (0x7f4956f280a6 in libtorch.so) frame #3: torch::jit::script::Method::operator()(std::vector<c10::IValue, std::allocator >, std::unordered_map<std::string, c10::IValue, std::hash, std::equal_to, std::allocator<std::pair<std::string const, c10::IValue> > > const&) + 0xc9 (0x7f4956ee6709 in libtorch.so) frame #4: angel::TorchModel::forward(std::vector<c10::IValue, std::allocator >) + 0xcc (0x7f4960c3c744 in /data/data3/yarn/nm2/usercache/service/filecache/39/libtorch_angel.so) frame #5: angel::TorchModel::backward(std::vector<c10::IValue, std::allocator >, at::Tensor) + 0x67 (0x7f4960c3cb77 in /data/data3/yarn/nm2/usercache/service/filecache/39/libtorch_angel.so) frame #6: Java_com_tencent_angel_pytorch_Torch_gcnBackward + 0x33c (0x7f4960c31d70 in /data/data3/yarn/nm2/usercache/service/filecache/39/libtorch_angel.so) frame #7: [0x7f49c49bd6c7]

rachelsunrh commented 3 years ago

你使用的是二阶DGI Python, 在提交脚本里需要设置second=true