FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.7k stars 1.55k forks source link

hetero_pearson报错 #4705

Closed szshary closed 3 months ago

szshary commented 1 year ago

What deployment mode you are use? Kuberentes

KubeFATE:1.8.1 使用eggroll

What happen?

双方跑训练,guest方数据25万(255特征),host方17万(4个特征)

训练流程 image

报错信息 guest方报错信息

[ERROR] [2023-03-08 02:01:33,105] [202303080047063754180] [349109:140310074283840] - [task_executor.run] [line:243]: ('Failed to call command: CommandURI(_uri=v1/egg-pair/runTask) to endpoint: nodemanager-0:37991, caused by: ', <_Rendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "Exception calling application:

==== detail start, at 20230308.020133.068 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call kwargs=getattr(command_request, '_kwargs')) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch raise e File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch call_result = _method(_instance, deserialized_args) File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper raise RuntimeError(msg) RuntimeError:

==== detail start, at 20230308.020133.054 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

==== detail end ====

==== detail end ====

" debug_error_string = "{"created":"@1678240893.070591714","description":"Error received from peer ipv4:10.42.0.229:37991","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Exception calling application: \n\n==== detail start, at 20230308.020133.068 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(*args, *kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call\n kwargs=getattr(command_request, '_kwargs'))\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch\n raise e\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch\n call_result = _method(_instance, deserialized_args)\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper\n raise RuntimeError(msg)\nRuntimeError: \n\n==== detail start, at 20230308.020133.054 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\n==== detail end ====\n\n\n\n==== detail end ====\n\n","grpc_status":2}"

) Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/client.py", line 84, in sync_send response = _command_stub.call(request.to_proto()) File "/opt/app-root/lib/python3.6/site-packages/grpc/_channel.py", line 604, in call return _end_unary_response_blocking(state, call, False, None) File "/opt/app-root/lib/python3.6/site-packages/grpc/_channel.py", line 506, in _end_unary_response_blocking raise _Rendezvous(state, None, None, deadline) grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "Exception calling application:

==== detail start, at 20230308.020133.068 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call kwargs=getattr(command_request, '_kwargs')) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch raise e File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch call_result = _method(_instance, deserialized_args) File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper raise RuntimeError(msg) RuntimeError:

==== detail start, at 20230308.020133.054 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

==== detail end ====

==== detail end ====

" debug_error_string = "{"created":"@1678240893.070591714","description":"Error received from peer ipv4:10.42.0.229:37991","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Exception calling application: \n\n==== detail start, at 20230308.020133.068 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(*args, *kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call\n kwargs=getattr(command_request, '_kwargs'))\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch\n raise e\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch\n call_result = _method(_instance, deserialized_args)\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper\n raise RuntimeError(msg)\nRuntimeError: \n\n==== detail start, at 20230308.020133.054 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\n==== detail end ====\n\n\n\n==== detail end ====\n\n","grpc_status":2}"

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/projects/fate/fateflow/python/fate_flow/worker/task_executor.py", line 195, in run cpn_output = run_object.run(cpn_input) File "/data/projects/fate/fate/python/federatedml/model_base.py", line 236, in run self._run(cpn_input=cpn_input) File "/data/projects/fate/fate/python/federatedml/model_base.py", line 314, in _run this_data_output = func(params) File "/data/projects/fate/fate/python/federatedml/statistic/correlation/hetero_pearson.py", line 178, in fit self.corr = spdz.dot(x, y, "corr").get() / n File "/data/projects/fate/fate/python/federatedml/secureprotol/spdz/spdz.py", line 78, in dot return left.dot(right, target_name) File "/data/projects/fate/fate/python/federatedml/secureprotol/spdz/tensor/fixedpoint_table.py", line 103, in dot y_add_b = (other + b).rescontruct(f"{target_name}_confuse_y") File "/data/projects/fate/fate/python/federatedml/secureprotol/spdz/tensor/fixedpoint_table.py", line 196, in rescontruct for other_share in spdz.communicator.get_rescontruct_shares(name): File "/data/projects/fate/fate/python/federatedml/secureprotol/spdz/communicator/federation.py", line 57, in get_rescontruct_shares return self._rescontruct_variable.get_parties(self._other_parties, suffix=(tensor_name,)) File "/data/projects/fate/fate/python/fate_arch/federation/transfer_variable.py", line 241, in get_parties name=name, tag=tag, parties=parties, gc=self._get_gc File "/data/projects/fate/fate/python/fate_arch/federation/eggroll/_federation.py", line 56, in get raw_result = _get(name, tag, parties, self._rsc, gc) File "/data/projects/fate/fate/python/fate_arch/federation/eggroll/_federation.py", line 108, in _get v = future.result() File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result return self.get_result() File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/data/projects/fate/eggroll/python/eggroll/core/datastructure/threadpool.py", line 51, in run result = self.fn(self.args, self.kwargs) File "/data/projects/fate/eggroll/python/eggroll/roll_site/roll_site.py", line 647, in _pull_one raise e File "/data/projects/fate/eggroll/python/eggroll/roll_site/roll_site.py", line 607, in _pull_one pull_status, all_finished, total_batches, cur_pairs = get_status(self) File "/data/projects/fate/eggroll/python/eggroll/roll_site/roll_site.py", line 566, in get_status all_status = store.with_stores(get_partition_status, options={"__op": "get_partition_status"}) File "/data/projects/fate/eggroll/python/eggroll/core/aspects.py", line 30, in wrapper result = func(*args, *kwargs) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/roll_pair.py", line 1152, in with_stores ret_pair = future.result()[0] File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result return self.get_result() File "/opt/rh/rh-python36/root/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/data/projects/fate/eggroll/python/eggroll/core/datastructure/threadpool.py", line 51, in run result = self.fn(self.args, self.kwargs) File "/data/projects/fate/eggroll/python/eggroll/core/client.py", line 97, in sync_send raise CommandCallError(command_uri, endpoint, e) eggroll.core.client.CommandCallError: ('Failed to call command: CommandURI(_uri=v1/egg-pair/runTask) to endpoint: nodemanager-0:37991, caused by: ', <_Rendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "Exception calling application:

==== detail start, at 20230308.020133.068 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call kwargs=getattr(command_request, '_kwargs')) File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch raise e File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch call_result = _method(_instance, deserialized_args) File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper raise RuntimeError(msg) RuntimeError:

==== detail start, at 20230308.020133.054 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, *kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task value=self.functor_serdes.serialize(f(task))) File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize return cloudpickle.dumps(_obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps cp.dump(obj) File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump return Pickler.dump(self, obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump self.save(obj) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce save(args) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save f(self, obj) # Call unbound method with explicit self File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple save(element) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save self.save_reduce(obj=obj, *rv) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce self._batch_setitems(dictitems) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems tmp = list(islice(it, self._BATCHSIZE)) RuntimeError: dictionary changed size during iteration

==== detail end ====

==== detail end ====

" debug_error_string = "{"created":"@1678240893.070591714","description":"Error received from peer ipv4:10.42.0.229:37991","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Exception calling application: \n\n==== detail start, at 20230308.020133.068 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(*args, *kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 33, in call\n kwargs=getattr(command_request, '_kwargs'))\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch\n raise e\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch\n call_result = _method(_instance, deserialized_args)\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper\n raise RuntimeError(msg)\nRuntimeError: \n\n==== detail start, at 20230308.020133.054 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(args, kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 690, in run_task\n value=self.functor_serdes.serialize(f(task)))\n File "/data/projects/fate/eggroll/python/eggroll/core/serdes/eggroll_serdes.py", line 58, in serialize\n return cloudpickle.dumps(_obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 931, in dumps\n cp.dump(obj)\n File "/opt/app-root/lib/python3.6/site-packages/cloudpickle/cloudpickle.py", line 284, in dump\n return Pickler.dump(self, obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump\n self.save(obj)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 606, in save_reduce\n save(args)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save\n f(self, obj) # Call unbound method with explicit self\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple\n save(element)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 521, in save\n self.save_reduce(obj=obj, rv)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 631, in save_reduce\n self._batch_setitems(dictitems)\n File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 841, in _batch_setitems\n tmp = list(islice(it, self._BATCHSIZE))\nRuntimeError: dictionary changed size during iteration\n\n==== detail end ====\n\n\n\n==== detail end ====\n\n","grpc_status":2}"

)

owlet42 commented 1 year ago

It seems a FATE's issue, transfer to FATE.

github-actions[bot] commented 3 months ago

This issue has been marked as stale because it has been open for 365 days with no activity. If this issue is still relevant or if there is new information, please feel free to update or reopen it.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 1 days since being marked as stale. If this issue is still relevant or if there is new information, please feel free to update or reopen it.