FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.6k stars 1.54k forks source link

Error found in doing graph classification using homo_nn examples #1390

Closed novaxiaohui closed 1 week ago

novaxiaohui commented 4 years ago

Describe the bug I followed the instruction in []( https://mp.weixin.qq.com/s?__biz=MzAwNzUyNzI5Mw==&mid=2730791187&idx=1&sn=6f08e9caf5121f287cbc5bdd11d63758&chksm=bc4cff018b3b76179c84db28ef929516cf5b01227534b8458377398316949a51bc6d1dbd36bc&mpshare=1&scene=1&srcid=&sharer_sharetime=1589872108878&sharer_shareid=31f75cb8a2c5da0f6ba35b885019d710&key=bb1d37e7a4261ac582c852244207dfa54c88210a16b72496c86a72717c11765e780e8274481b6071b8ba5679c87bcf4a501f8c44976b9629faf62be32e40ac870d95cfdee4751a11415de0d4f0900944&ascene=1&uin=MTY3MjIxODU1&devicetype=Windows+7+x64&version=62090070&lang=zh_CN&exportkey=AZ%2B7oNXCTLWqAlx4TaWcO4w%3D&pass_ticket=cNS3OKfcejFUFvkCB97SXo0iOYJalVW6NJBm%2FGtGn5A%3D to test the homo_nn examples using MNIST dataset.) When I submit the task using the follow command , the job failed .

$ python fate_flow/fate_flow_client.py -f submit_job -c examples/federatedml-1.x-examples/homo_nn/test_homo_nn_keras_temperate.json -d examples/federatedml-1.x-examples/homo_nn/test_homo_nn_train_then_predict.json

I repeated sometimes, the error showed in datrio_0 or homo_nn_0 or homo_nn_1 stage,but error message is always the same which showed in screen shots:

I tried using only one machine to submit the task, got the same result. Like something wrong in eggroll services, anyboby met the same issue ?

Expected behavior I need some advice to solve the problem.

Screenshots error messages: error4 error

configuration: error2

dataset: error3

Desktop :

Origin-Draven commented 4 years ago

toy_example 可以走通吗? cd examples/toy_example python run_toy_example $partyid_1 $partyid_2 1 之前我遇到这个问题是 cpu 缺乏指令集 cat /proc/cpuinfo |grep avx2 参考文档 https://github.com/FederatedAI/KubeFATE/wiki/KubeFATE

novaxiaohui commented 4 years ago

toy_example 可以走通吗? cd examples/toy_example python run_toy_example $partyid_1 $partyid_2 1 之前我遇到这个问题是 cpu 缺乏指令集 cat /proc/cpuinfo |grep avx2 参考文档 https://github.com/FederatedAI/KubeFATE/wiki/KubeFATE

toy_example 可以走通呀,跑了LR没有问题 ,训练执行success,然后模型服务也正常。 所以很奇怪呢 我用的虚拟机,cpuinfo 里有 AVX2哈