FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.66k stars 1.55k forks source link

fate on spark 在执行heteroLR训练阶段报TypeError #3170

Closed KRCheung closed 2 months ago

KRCheung commented 2 years ago

bug描述 将eggroll替换成spark之后,执行heteroLR训练,hetero_lr_0组件运行错误,报TypeError

复现过程 可稳定复现。 整个流程使用fateflow提供的rest api。 1,guest方和host方分别上传数据到fate flow主机(这里的数据是python容器内提供的样例数据,在用eggroll作为底层引擎时正常运行)。这个阶段能够正常上传。 2,guest方发起训练任务(训练任务使用的conf和dsl也都是python容器内提供的默认配置,在用eggroll作为底层引擎时正常运行。在使用spark时,我们将conf文件中的backend字段改成1,其余字段不变)。这个阶段在hetero_lr_0组件运行阶段报错退出。

正常情况 能够像使用eggroll那样正常结束训练阶段。

屏幕截图 image

1
[INFO] [2021-10-12 09:55:33,872] [30438:140697006597952] - task_executor.py[line:347]: report task 202110120955284937770_reader_0 0 guest 9999 to driver
2
[INFO] [2021-10-12 09:55:33,872] [30438:140697006597952] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_reader_0 0 on guest 9999
3
[INFO] [2021-10-12 09:55:34,665] [30438:140697006597952] - task_executor.py[line:136]: Run 202110120955284937770 reader_0 202110120955284937770_reader_0 guest 9999 task
4
[INFO] [2021-10-12 09:55:34,665] [30438:140697006597952] - task_executor.py[line:137]: Component parameters on party {'ReaderParam': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'fate_flow/components/reader.py/Reader', 'module': 'Reader', 'output_data_name': ['table']}
5
[INFO] [2021-10-12 09:55:34,665] [30438:140697006597952] - task_executor.py[line:138]: Task input dsl {}
6
[INFO] [2021-10-12 09:55:36,400] [30438:140697006597952] - reader.py[line:221]: start copying table
7
[INFO] [2021-10-12 09:55:36,401] [30438:140697006597952] - reader.py[line:223]: source table name: breast_hetero_guest namespace: experiment engine: HDFS
8
[INFO] [2021-10-12 09:55:36,401] [30438:140697006597952] - reader.py[line:225]: destination table name: 8b56c11a2b4211ec80860242c0a70064 namespace: output_data_202110120955284937770_reader_0_0 engine: HDFS
9
[INFO] [2021-10-12 09:55:36,410] [30438:140697006597952] - _table.py[line:77]: put in hdfs file: hdfs://namenode:9000//fate/output_data/output_data_202110120955284937770_reader_0_0/8b56c11a2b4211ec80860242c0a70064
10
[INFO] [2021-10-12 09:55:36,545] [30438:140697006597952] - reader.py[line:249]: copy successfully
11
[INFO] [2021-10-12 09:55:36,584] [30438:140697006597952] - tracker_client.py[line:213]: Request save job 202110120955284937770 task 202110120955284937770_reader_0 0 on guest 9999 data table info
12
[INFO] [2021-10-12 09:55:36,607] [30438:140697006597952] - tracker_client.py[line:106]: Request save job 202110120955284937770 task 202110120955284937770_reader_0 0 on guest 9999 metric reader_namespace reader_name meta
13
[INFO] [2021-10-12 09:55:36,733] [30438:140697006597952] - task_executor.py[line:347]: report task 202110120955284937770_reader_0 0 guest 9999 to driver
14
[INFO] [2021-10-12 09:55:36,733] [30438:140697006597952] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_reader_0 0 on guest 9999
15
[INFO] [2021-10-12 09:55:36,817] [30438:140697006597952] - task_executor.py[line:190]: task 202110120955284937770_reader_0 guest 9999 start time: 2021-10-12 09:55:33
16
[INFO] [2021-10-12 09:55:36,817] [30438:140697006597952] - task_executor.py[line:192]: task 202110120955284937770_reader_0 guest 9999 end time: 2021-10-12 09:55:36
17
[INFO] [2021-10-12 09:55:36,818] [30438:140697006597952] - task_executor.py[line:194]: task 202110120955284937770_reader_0 guest 9999 takes 3.052s
18
[INFO] [2021-10-12 09:55:36,818] [30438:140697006597952] - task_executor.py[line:197]: Finish 202110120955284937770 reader_0 202110120955284937770_reader_0 0 guest 9999 task success
19
[INFO] [2021-10-12 09:55:36,819] [30438:140697006597952] - task_executor.py[line:347]: report task 202110120955284937770_reader_0 0 guest 9999 to driver
[INFO] [2021-10-12 09:55:36,819] [30438:140697006597952] - task_executor.py[line:347]: report task 202110120955284937770_reader_0 0 guest 9999 to driver
20
[INFO] [2021-10-12 09:55:36,820] [30438:140697006597952] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_reader_0 0 on guest 9999
21
[INFO] [2021-10-12 09:56:00,044] [30905:140409515906880] - task_executor.py[line:347]: report task 202110120955284937770_dataio_0 0 guest 9999 to driver
22
[INFO] [2021-10-12 09:56:00,044] [30905:140409515906880] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_dataio_0 0 on guest 9999
23
[INFO] [2021-10-12 09:56:00,851] [30905:140409515906880] - task_executor.py[line:136]: Run 202110120955284937770 dataio_0 202110120955284937770_dataio_0 guest 9999 task
24
[INFO] [2021-10-12 09:56:00,852] [30905:140409515906880] - task_executor.py[line:137]: Component parameters on party {'DataIOParam': {'input_format': 'dense', 'delimitor': ',', 'data_type': 'float64', 'exclusive_data_type': None, 'tag_with_value': False, 'tag_value_delimitor': ':', 'missing_fill': False, 'default_value': 0, 'missing_fill_method': None, 'missing_impute': None, 'outlier_replace': False, 'outlier_replace_method': None, 'outlier_impute': None, 'outlier_replace_value': 0, 'with_label': True, 'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'need_run': True}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'federatedml/util/data_io.py/DataIO', 'module': 'DataIO', 'output_data_name': ['train']}
25
[INFO] [2021-10-12 09:56:00,852] [30905:140409515906880] - task_executor.py[line:138]: Task input dsl {'data': {'data': ['reader_0.table']}}
26
[INFO] [2021-10-12 09:56:00,852] [30905:140409515906880] - tracker_client.py[line:236]: Request read job 202110120955284937770 task None None on guest 9999 data table info
27
[INFO] [2021-10-12 09:56:00,877] [30905:140409515906880] - task_executor.py[line:301]: load computing table use 8
28
[WARNING] [2021-10-12 09:56:09,953] [30905:140409515906880] - data_io.py[line:855]: DataIO is deprecated, and will be removed in 1.7, use DataTransform module instead
29
[INFO] [2021-10-12 09:56:10,069] [30905:140409515906880] - data_io.py[line:118]: start to read dense data and change data to instance
30
[INFO] [2021-10-12 09:56:14,248] [30905:140409515906880] - tracker_client.py[line:259]: Request save job 202110120955284937770 task 202110120955284937770_dataio_0 0 on guest 9999 component summary
31
[INFO] [2021-10-12 09:56:17,703] [30905:140409515906880] - pipelined_model.py[line:107]: Save dataio_0 dataio DataIOMeta buffer
32
[INFO] [2021-10-12 09:56:17,704] [30905:140409515906880] - pipelined_model.py[line:107]: Save dataio_0 dataio DataIOParam buffer
33
[INFO] [2021-10-12 09:56:17,723] [30905:140409515906880] - task_executor.py[line:347]: report task 202110120955284937770_dataio_0 0 guest 9999 to driver
34
[INFO] [2021-10-12 09:56:17,723] [30905:140409515906880] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_dataio_0 0 on guest 9999
35
[INFO] [2021-10-12 09:56:17,776] [30905:140409515906880] - task_executor.py[line:190]: task 202110120955284937770_dataio_0 guest 9999 start time: 2021-10-12 09:55:59
36
[INFO] [2021-10-12 09:56:17,777] [30905:140409515906880] - task_executor.py[line:192]: task 202110120955284937770_dataio_0 guest 9999 end time: 2021-10-12 09:56:17
37
[INFO] [2021-10-12 09:56:17,777] [30905:140409515906880] - task_executor.py[line:194]: task 202110120955284937770_dataio_0 guest 9999 takes 17.771s
38
[INFO] [2021-10-12 09:56:17,777] [30905:140409515906880] - task_executor.py[line:197]: Finish 202110120955284937770 dataio_0 202110120955284937770_dataio_0 0 guest 9999 task success
39
[INFO] [2021-10-12 09:56:17,802] [30905:140409515906880] - task_executor.py[line:347]: report task 202110120955284937770_dataio_0 0 guest 9999 to driver
40
[INFO] [2021-10-12 09:56:17,802] [30905:140409515906880] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_dataio_0 0 on guest 9999
41
[INFO] [2021-10-12 09:56:41,168] [31603:140025353537344] - task_executor.py[line:347]: report task 202110120955284937770_intersection_0 0 guest 9999 to driver
42
[INFO] [2021-10-12 09:56:41,168] [31603:140025353537344] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999
43
[INFO] [2021-10-12 09:56:41,969] [31603:140025353537344] - task_executor.py[line:136]: Run 202110120955284937770 intersection_0 202110120955284937770_intersection_0 guest 9999 task
44
[INFO] [2021-10-12 09:56:41,970] [31603:140025353537344] - task_executor.py[line:137]: Component parameters on party {'IntersectParam': {'intersect_method': 'raw', 'random_bit': 128, 'sync_intersect_ids': True, 'join_role': 'guest', 'with_encode': False, 'encode_params': {'salt': '', 'encode_method': 'none', 'base64': False}, 'rsa_params': {'salt': '', 'hash_method': 'sha256', 'final_hash_method': 'sha256', 'split_calculation': False, 'random_base_fraction': None, 'key_length': 1024}, 'only_output_key': False, 'intersect_cache_param': {'use_cache': False, 'id_type': 'phone', 'encrypt_type': 'sha256'}, 'repeated_id_process': False, 'repeated_id_owner': 'guest', 'allow_info_share': False, 'info_owner': 'guest', 'with_sample_id': False}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'federatedml/statistic/intersect/intersect_model.py/IntersectGuest', 'module': 'Intersection', 'output_data_name': ['train']}
45
[INFO] [2021-10-12 09:56:41,970] [31603:140025353537344] - task_executor.py[line:138]: Task input dsl {'data': {'data': ['dataio_0.train']}}
46
[INFO] [2021-10-12 09:56:41,970] [31603:140025353537344] - tracker_client.py[line:236]: Request read job 202110120955284937770 task None None on guest 9999 data train info
47
[INFO] [2021-10-12 09:56:42,004] [31603:140025353537344] - task_executor.py[line:301]: load computing table use 8
48
[WARNING] [2021-10-12 09:56:51,646] [31603:140025353537344] - intersect_param.py[line:63]: 'EncodeParam' will be renamed to 'RawParam' in future release.Please do not rely on current param naming in application.
49
[INFO] [2021-10-12 09:56:51,782] [31603:140025353537344] - intersect_model.py[line:50]: Using raw intersection, role is guest
50
[INFO] [2021-10-12 09:56:51,783] [31603:140025353537344] - intersect_guest.py[line:239]: Start raw intersection
51
[INFO] [2021-10-12 09:56:51,783] [31603:140025353537344] - intersect.py[line:349]: Join id role is guest
52
[INFO] [2021-10-12 09:57:04,681] [31603:140025353537344] - intersect.py[line:373]: Get ids_list from role-send, ids_list size is 1
53
[INFO] [2021-10-12 09:57:07,125] [31603:140025353537344] - intersect.py[line:384]: Finish intersect_ids computing
54
[INFO] [2021-10-12 09:57:16,788] [31603:140025353537344] - intersect.py[line:399]: Remote intersect ids to role-send
55
[INFO] [2021-10-12 09:57:19,234] [31603:140025353537344] - intersect.py[line:84]: obtain intersect data_instances!
56
[INFO] [2021-10-12 09:57:22,359] [31603:140025353537344] - intersect.py[line:421]: save guest_0's id in name:202110120955284937770_intersection_0_0_0, namespace:9999#None#mountain
57
[INFO] [2021-10-12 09:57:22,426] [31603:140025353537344] - intersect_model.py[line:136]: Finish intersection
58
[INFO] [2021-10-12 09:57:22,702] [31603:140025353537344] - tracker_client.py[line:73]: Request save job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999 metric train intersection data
59
[INFO] [2021-10-12 09:57:22,723] [31603:140025353537344] - tracker_client.py[line:106]: Request save job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999 metric train intersection meta
60
[INFO] [2021-10-12 09:57:22,742] [31603:140025353537344] - tracker_client.py[line:259]: Request save job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999 component summary
61
[INFO] [2021-10-12 09:57:22,915] [31603:140025353537344] - intersect_model.py[line:154]: intersect_ids count:33
62
[INFO] [2021-10-12 09:57:22,915] [31603:140025353537344] - intersect_model.py[line:155]: intersect_ids header schema:{'header': ['x0', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9'], 'sid_name': 'id', 'label_name': 'y'}
63
[INFO] [2021-10-12 09:57:24,843] [31603:140025353537344] - task_executor.py[line:347]: report task 202110120955284937770_intersection_0 0 guest 9999 to driver
64
[INFO] [2021-10-12 09:57:24,843] [31603:140025353537344] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999
65
[INFO] [2021-10-12 09:57:24,894] [31603:140025353537344] - task_executor.py[line:190]: task 202110120955284937770_intersection_0 guest 9999 start time: 2021-10-12 09:56:41
66
[INFO] [2021-10-12 09:57:24,894] [31603:140025353537344] - task_executor.py[line:192]: task 202110120955284937770_intersection_0 guest 9999 end time: 2021-10-12 09:57:24
67
[INFO] [2021-10-12 09:57:24,894] [31603:140025353537344] - task_executor.py[line:194]: task 202110120955284937770_intersection_0 guest 9999 takes 43.768s
68
[INFO] [2021-10-12 09:57:24,894] [31603:140025353537344] - task_executor.py[line:197]: Finish 202110120955284937770 intersection_0 202110120955284937770_intersection_0 0 guest 9999 task success
69
[INFO] [2021-10-12 09:57:24,913] [31603:140025353537344] - task_executor.py[line:347]: report task 202110120955284937770_intersection_0 0 guest 9999 to driver
70
[INFO] [2021-10-12 09:57:24,913] [31603:140025353537344] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_intersection_0 0 on guest 9999
71
[INFO] [2021-10-12 09:57:49,130] [32537:140436694271808] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_binning_0 0 guest 9999 to driver
72
[INFO] [2021-10-12 09:57:49,131] [32537:140436694271808] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_binning_0 0 on guest 9999
73
[INFO] [2021-10-12 09:57:49,970] [32537:140436694271808] - task_executor.py[line:136]: Run 202110120955284937770 hetero_feature_binning_0 202110120955284937770_hetero_feature_binning_0 guest 9999 task
74
[INFO] [2021-10-12 09:57:49,970] [32537:140436694271808] - task_executor.py[line:137]: Component parameters on party {'HeteroFeatureBinningParam': {'method': 'quantile', 'compress_thres': 10000, 'head_size': 10000, 'error': 0.0001, 'adjustment_factor': 0.5, 'bin_num': 10, 'bin_indexes': -1, 'bin_names': None, 'category_indexes': None, 'category_names': None, 'transform_param': {'transform_cols': -1, 'transform_names': None, 'transform_type': 'bin_num'}, 'need_run': True, 'skip_static': False, 'local_only': False, 'optimal_binning_param': {'init_bucket_method': 'quantile', 'metric_method': 'iv', 'max_bin': None, 'mixture': True, 'max_bin_pct': 1.0, 'min_bin_pct': 0.05, 'init_bin_nums': 1000, 'adjustment_factor': None}, 'encrypt_param': {'method': 'Paillier', 'key_length': 1024}}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'federatedml/feature/hetero_feature_binning/hetero_binning_guest.py/HeteroFeatureBinningGuest', 'module': 'HeteroFeatureBinning', 'output_data_name': ['train']}
75
[INFO] [2021-10-12 09:57:49,971] [32537:140436694271808] - task_executor.py[line:138]: Task input dsl {'data': {'data': ['intersection_0.train']}}
76
[INFO] [2021-10-12 09:57:49,971] [32537:140436694271808] - tracker_client.py[line:236]: Request read job 202110120955284937770 task None None on guest 9999 data train info
77
[INFO] [2021-10-12 09:57:50,005] [32537:140436694271808] - task_executor.py[line:301]: load computing table use 8
78
[INFO] [2021-10-12 09:57:59,990] [32537:140436694271808] - hetero_binning_guest.py[line:38]: Start feature binning fit and transform
79
[WARNING] [2021-10-12 09:58:04,405] [32537:140436694271808] - _table.py[line:94]: please use `applyPartitions` instead of `mapPartitions` if the previous behavior was expected. The previous behavior will not work in future
80
[INFO] [2021-10-12 09:58:16,925] [32537:140436694271808] - hetero_binning_guest.py[line:78]: Sent encrypted_label_table to host
81
[INFO] [2021-10-12 09:58:40,568] [32537:140436694271808] - hetero_binning_guest.py[line:87]: Get encrypted_bin_sum from host
82
[INFO] [2021-10-12 09:58:42,828] [32537:140436694271808] - hetero_binning_guest.py[line:123]: Finish feature binning fit and transform
83
[INFO] [2021-10-12 09:58:42,860] [32537:140436694271808] - tracker_client.py[line:259]: Request save job 202110120955284937770 task 202110120955284937770_hetero_feature_binning_0 0 on guest 9999 component summary
84
[INFO] [2021-10-12 09:58:45,110] [32537:140436694271808] - pipelined_model.py[line:107]: Save hetero_feature_binning_0 hetero_feature_binning FeatureBinningMeta buffer
85
[INFO] [2021-10-12 09:58:45,112] [32537:140436694271808] - pipelined_model.py[line:107]: Save hetero_feature_binning_0 hetero_feature_binning FeatureBinningParam buffer
86
[INFO] [2021-10-12 09:58:45,137] [32537:140436694271808] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_binning_0 0 guest 9999 to driver
87
[INFO] [2021-10-12 09:58:45,137] [32537:140436694271808] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_binning_0 0 on guest 9999
88
[INFO] [2021-10-12 09:58:45,176] [32537:140436694271808] - task_executor.py[line:190]: task 202110120955284937770_hetero_feature_binning_0 guest 9999 start time: 2021-10-12 09:57:49
89
[INFO] [2021-10-12 09:58:45,176] [32537:140436694271808] - task_executor.py[line:192]: task 202110120955284937770_hetero_feature_binning_0 guest 9999 end time: 2021-10-12 09:58:45
90
[INFO] [2021-10-12 09:58:45,176] [32537:140436694271808] - task_executor.py[line:194]: task 202110120955284937770_hetero_feature_binning_0 guest 9999 takes 56.132s
91
[INFO] [2021-10-12 09:58:45,177] [32537:140436694271808] - task_executor.py[line:197]: Finish 202110120955284937770 hetero_feature_binning_0 202110120955284937770_hetero_feature_binning_0 0 guest 9999 task success
92
[INFO] [2021-10-12 09:58:45,205] [32537:140436694271808] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_binning_0 0 guest 9999 to driver
93
[INFO] [2021-10-12 09:58:45,206] [32537:140436694271808] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_binning_0 0 on guest 9999
94
[INFO] [2021-10-12 09:59:09,725] [33546:139895824156480] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_selection_0 0 guest 9999 to driver
95
[INFO] [2021-10-12 09:59:09,725] [33546:139895824156480] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_selection_0 0 on guest 9999
96
[INFO] [2021-10-12 09:59:10,516] [33546:139895824156480] - task_executor.py[line:136]: Run 202110120955284937770 hetero_feature_selection_0 202110120955284937770_hetero_feature_selection_0 guest 9999 task
97
[INFO] [2021-10-12 09:59:10,517] [33546:139895824156480] - task_executor.py[line:137]: Component parameters on party {'FeatureSelectionParam': {'correlation_param': {'sort_metric': 'iv', 'threshold': 0.1, 'select_federated': True}, 'vif_param': {'metrics': 'vif', 'filter_type': 'threshold', 'take_high': False, 'threshold': 5.0, 'host_thresholds': None, 'select_federated': True}, 'select_col_indexes': -1, 'select_names': [], 'filter_methods': ['manually'], 'unique_param': {'eps': 1e-05}, 'iv_value_param': {'value_threshold': 0.0, 'host_thresholds': None, 'local_only': False}, 'iv_percentile_param': {'percentile_threshold': 1.0, 'local_only': False}, 'iv_top_k_param': {'k': 10, 'local_only': False}, 'variance_coe_param': {'value_threshold': 1.0}, 'outlier_param': {'percentile': 1.0, 'upper_threshold': 1.0}, 'percentage_value_param': {'upper_pct': 1.0}, 'manually_param': {'filter_out_indexes': None, 'filter_out_names': None, 'left_col_indexes': None, 'left_col_names': None}, 'iv_param': {'metrics': 'iv', 'filter_type': 'threshold', 'take_high': True, 'threshold': 1, 'host_thresholds': None, 'select_federated': True}, 'statistic_param': {'metrics': 'mean', 'filter_type': 'threshold', 'take_high': True, 'threshold': 1, 'host_thresholds': None, 'select_federated': True}, 'psi_param': {'metrics': 'psi', 'filter_type': 'threshold', 'take_high': False, 'threshold': 1, 'host_thresholds': None, 'select_federated': True}, 'sbt_param': {'metrics': 'feature_importance', 'filter_type': 'threshold', 'take_high': True, 'threshold': 1, 'host_thresholds': None, 'select_federated': True}, 'need_run': True}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'federatedml/feature/hetero_feature_selection/feature_selection_guest.py/HeteroFeatureSelectionGuest', 'module': 'HeteroFeatureSelection', 'output_data_name': ['train']}
98
[INFO] [2021-10-12 09:59:10,517] [33546:139895824156480] - task_executor.py[line:138]: Task input dsl {'data': {'data': ['hetero_feature_binning_0.train']}, 'isometric_model': ['hetero_feature_binning_0.hetero_feature_binning']}
99
[INFO] [2021-10-12 09:59:10,517] [33546:139895824156480] - tracker_client.py[line:236]: Request read job 202110120955284937770 task None None on guest 9999 data train info
100
[INFO] [2021-10-12 09:59:10,540] [33546:139895824156480] - task_executor.py[line:301]: load computing table use 8
101
[INFO] [2021-10-12 09:59:20,078] [33546:139895824156480] - pipelined_model.py[line:268]: parse FeatureBinningMeta proto object normal
102
[INFO] [2021-10-12 09:59:20,080] [33546:139895824156480] - pipelined_model.py[line:268]: parse FeatureBinningParam proto object normal
103
[INFO] [2021-10-12 09:59:20,814] [33546:139895824156480] - base_feature_selection.py[line:302]: Start Hetero Selection Fit and transform.
104
[INFO] [2021-10-12 09:59:21,973] [33546:139895824156480] - base_feature_selection.py[line:343]: Finish Hetero Selection Fit and transform.
105
[INFO] [2021-10-12 09:59:21,974] [33546:139895824156480] - tracker_client.py[line:259]: Request save job 202110120955284937770 task 202110120955284937770_hetero_feature_selection_0 0 on guest 9999 component summary
106
[INFO] [2021-10-12 09:59:25,180] [33546:139895824156480] - pipelined_model.py[line:107]: Save hetero_feature_selection_0 selected FeatureSelectionMeta buffer
107
[INFO] [2021-10-12 09:59:25,180] [33546:139895824156480] - pipelined_model.py[line:107]: Save hetero_feature_selection_0 selected FeatureSelectionParam buffer
108
[INFO] [2021-10-12 09:59:25,209] [33546:139895824156480] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_selection_0 0 guest 9999 to driver
109
[INFO] [2021-10-12 09:59:25,210] [33546:139895824156480] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_selection_0 0 on guest 9999
110
[INFO] [2021-10-12 09:59:25,317] [33546:139895824156480] - task_executor.py[line:190]: task 202110120955284937770_hetero_feature_selection_0 guest 9999 start time: 2021-10-12 09:59:09
111
[INFO] [2021-10-12 09:59:25,317] [33546:139895824156480] - task_executor.py[line:192]: task 202110120955284937770_hetero_feature_selection_0 guest 9999 end time: 2021-10-12 09:59:25
112
[INFO] [2021-10-12 09:59:25,317] [33546:139895824156480] - task_executor.py[line:194]: task 202110120955284937770_hetero_feature_selection_0 guest 9999 takes 15.579s
113
[INFO] [2021-10-12 09:59:25,318] [33546:139895824156480] - task_executor.py[line:197]: Finish 202110120955284937770 hetero_feature_selection_0 202110120955284937770_hetero_feature_selection_0 0 guest 9999 task success
114
[INFO] [2021-10-12 09:59:25,385] [33546:139895824156480] - task_executor.py[line:347]: report task 202110120955284937770_hetero_feature_selection_0 0 guest 9999 to driver
115
[INFO] [2021-10-12 09:59:25,386] [33546:139895824156480] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_feature_selection_0 0 on guest 9999
116
[INFO] [2021-10-12 09:59:48,813] [34303:139796378810176] - task_executor.py[line:347]: report task 202110120955284937770_hetero_lr_0 0 guest 9999 to driver
117
[INFO] [2021-10-12 09:59:48,813] [34303:139796378810176] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_lr_0 0 on guest 9999
118
[INFO] [2021-10-12 09:59:49,594] [34303:139796378810176] - task_executor.py[line:136]: Run 202110120955284937770 hetero_lr_0 202110120955284937770_hetero_lr_0 guest 9999 task
119
[INFO] [2021-10-12 09:59:49,595] [34303:139796378810176] - task_executor.py[line:137]: Component parameters on party {'HeteroLogisticParam': {'penalty': 'L2', 'tol': 0.0001, 'alpha': 0.01, 'optimizer': 'rmsprop', 'batch_size': 320, 'learning_rate': 0.15, 'init_param': {'init_method': 'random_uniform', 'init_const': 1, 'fit_intercept': True, 'random_seed': None}, 'max_iter': 3, 'early_stop': 'diff', 'encrypt_param': {'method': 'Paillier', 'key_length': 1024}, 'predict_param': {'threshold': 0.5}, 'cv_param': {'n_splits': 5, 'mode': 'hetero', 'role': 'guest', 'shuffle': True, 'random_seed': 1, 'need_cv': False, 'output_fold_history': True, 'history_value_type': 'score'}, 'decay': 1, 'decay_sqrt': True, 'multi_class': 'ovr', 'validation_freqs': None, 'stepwise_param': {'score_name': 'AIC', 'mode': 'hetero', 'role': 'guest', 'direction': 'both', 'max_step': 10, 'nvmin': 2, 'nvmax': None, 'need_stepwise': False}, 'early_stopping_rounds': None, 'metrics': ['auc', 'ks'], 'use_first_metric_only': False, 'floating_point_precision': 23, 'encrypted_mode_calculator_param': {'mode': 'strict', 're_encrypted_rate': 1}, 'sqn_param': {'update_interval_L': 3, 'memory_M': 5, 'sample_size': 5000, 'random_seed': None}}, 'component_parameters': {'common': {'hetero_lr_0': {'alpha': 0.01, 'batch_size': 320, 'init_param': {'init_method': 'random_uniform'}, 'learning_rate': 0.15, 'max_iter': 3, 'optimizer': 'rmsprop', 'penalty': 'L2'}, 'intersection_0': {'intersect_method': 'raw', 'only_output_key': False, 'sync_intersect_ids': True}}, 'role': {'guest': {'0': {'dataio_0': {'label_name': 'y', 'label_type': 'int', 'output_format': 'dense', 'with_label': True}, 'reader_0': {'table': {'name': 'breast_hetero_guest', 'namespace': 'experiment'}}}}, 'host': {'0': {'dataio_0': {'output_format': 'dense', 'with_label': False}, 'reader_0': {'table': {'name': 'breast_hetero_host', 'namespace': 'experiment'}}}}}}, 'dsl_version': '2', 'initiator': {'party_id': 9999, 'role': 'guest'}, 'job_parameters': {'job_type': 'train', 'work_mode': 1, 'backend': 1, 'computing_engine': 'SPARK', 'federation_engine': 'RABBITMQ', 'storage_engine': 'HDFS', 'engines_address': {'computing': {'cores_per_node': 20, 'nodes': 2}, 'federation': {'host': 'rabbitmq', 'mng_port': 15672, 'password': 'fate', 'port': 5672, 'user': 'fate'}, 'storage': {'name_node': 'hdfs://namenode:9000'}}, 'federated_mode': 'MULTIPLE', 'federation_info': {'policy_id': 'phebebzwgw', 'union_name': 'tipr'}, 'task_cores': 4, 'task_parallelism': 2, 'computing_partitions': 8, 'federated_status_collect_type': 'PULL', 'model_id': 'arbiter-10000#guest-9999#host-10000#model', 'model_version': '202110120955284937770', 'eggroll_run': {}, 'spark_run': {'executor-cores': 2, 'num-executors': 2}, 'rabbitmq_run': {}, 'pulsar_run': {}, 'adaptation_parameters': {'if_initiator_baseline': False, 'request_task_cores': 4, 'task_cores_per_node': 2, 'task_memory_per_node': 0, 'task_nodes': 2}}, 'role': {'arbiter': [10000], 'guest': [9999], 'host': [10000]}, 'local': {'role': 'guest', 'party_id': 9999}, 'CodePath': 'federatedml/linear_model/logistic_regression/hetero_logistic_regression/hetero_lr_guest.py/HeteroLRGuest', 'module': 'HeteroLR', 'output_data_name': ['train']}
120
[INFO] [2021-10-12 09:59:49,595] [34303:139796378810176] - task_executor.py[line:138]: Task input dsl {'data': {'train_data': ['hetero_feature_selection_0.train']}}
121
[INFO] [2021-10-12 09:59:49,596] [34303:139796378810176] - tracker_client.py[line:236]: Request read job 202110120955284937770 task None None on guest 9999 data train info
122
[INFO] [2021-10-12 09:59:49,635] [34303:139796378810176] - task_executor.py[line:301]: load computing table use 8
123
[INFO] [2021-10-12 09:59:59,976] [34303:139796378810176] - one_vs_rest.py[line:302]: Create one_vs_rest object, role: guest, mode: hetero
124
[INFO] [2021-10-12 10:00:00,171] [34303:139796378810176] - hetero_lr_guest.py[line:65]: Enter hetero_lr_guest fit
125
[INFO] [2021-10-12 10:00:00,893] [34303:139796378810176] - linear_model_base.py[line:226]: Check for abnormal value passed
126
[INFO] [2021-10-12 10:00:05,755] [34303:139796378810176] - hetero_lr_guest.py[line:82]: Enter hetero_lr_guest fit
127
[INFO] [2021-10-12 10:00:11,729] [34303:139796378810176] - hetero_lr_guest.py[line:90]: Generate mini-batch from input data
128
[INFO] [2021-10-12 10:00:28,874] [34303:139796378810176] - hetero_lr_guest.py[line:99]: Start initialize model.
129
[INFO] [2021-10-12 10:00:28,875] [34303:139796378810176] - hetero_lr_guest.py[line:100]: fit_intercept:True
130
[INFO] [2021-10-12 10:00:28,991] [34303:139796378810176] - hetero_lr_guest.py[line:106]: iter:0
131
[ERROR] [2021-10-12 10:00:43,726] [34303:139796378810176] - task_executor.py[line:179]: unsupported operand type(s) for +: 'float' and 'NoneType'
132
Traceback (most recent call last):
133
  File "./fate/python/fate_flow/operation/task_executor.py", line 154, in run_task
134
    run_object.run(component_parameters_on_party, task_run_args)
135
  File "./fate/python/federatedml/model_base.py", line 101, in run
136
    this_data_output = func(*params)
137
  File "./fate/python/federatedml/linear_model/logistic_regression/hetero_logistic_regression/hetero_lr_guest.py", line 79, in fit
138
    self.fit_binary(data_instances, validate_data)
139
  File "./fate/python/federatedml/linear_model/logistic_regression/hetero_logistic_regression/hetero_lr_guest.py", line 124, in fit_binary
140
    batch_index)
141
  File "./fate/python/federatedml/optim/gradient/hetero_linear_model_gradient.py", line 264, in compute_gradient_procedure
142
    current_suffix=current_suffix)
143
  File "./fate/python/federatedml/optim/gradient/hetero_linear_model_gradient.py", line 221, in _asynchronous_compute_gradient
144
    half_g = self.compute_gradient(data_instances, self.half_d, False)
145
  File "./fate/python/federatedml/optim/gradient/hetero_linear_model_gradient.py", line 172, in compute_gradient
146
    gradient_sum = gradient_sum.reduce(lambda x, y: x + y)
147
  File "./fate/python/fate_arch/common/profile.py", line 282, in _fn
148
    rtn = func(*args, **kwargs)
149
  File "./fate/python/fate_arch/computing/spark/_table.py", line 145, in reduce
150
    return self._rdd.values().reduce(func)
151
  File "/data/projects/spark-2.4.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 846, in reduce
152
    return reduce(f, vals)
153
  File "/data/projects/spark-2.4.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper
154
    return f(*args, **kwargs)
155
  File "./fate/python/federatedml/optim/gradient/hetero_linear_model_gradient.py", line 172, in <lambda>
156
    gradient_sum = gradient_sum.reduce(lambda x, y: x + y)
157
TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'
158
[INFO] [2021-10-12 10:00:43,756] [34303:139796378810176] - task_executor.py[line:347]: report task 202110120955284937770_hetero_lr_0 0 guest 9999 to driver
159
[INFO] [2021-10-12 10:00:43,756] [34303:139796378810176] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_lr_0 0 on guest 9999
160
[INFO] [2021-10-12 10:00:43,947] [34303:139796378810176] - task_executor.py[line:190]: task 202110120955284937770_hetero_lr_0 guest 9999 start time: 2021-10-12 09:59:48
161
[INFO] [2021-10-12 10:00:43,948] [34303:139796378810176] - task_executor.py[line:192]: task 202110120955284937770_hetero_lr_0 guest 9999 end time: 2021-10-12 10:00:43
162
[INFO] [2021-10-12 10:00:43,948] [34303:139796378810176] - task_executor.py[line:194]: task 202110120955284937770_hetero_lr_0 guest 9999 takes 55.056s
163
[INFO] [2021-10-12 10:00:43,948] [34303:139796378810176] - task_executor.py[line:197]: Finish 202110120955284937770 hetero_lr_0 202110120955284937770_hetero_lr_0 0 guest 9999 task failed
164
[INFO] [2021-10-12 10:00:43,948] [34303:139796378810176] - task_executor.py[line:347]: report task 202110120955284937770_hetero_lr_0 0 guest 9999 to driver
165
[INFO] [2021-10-12 10:00:43,948] [34303:139796378810176] - control_client.py[line:42]: request update job 202110120955284937770 task 202110120955284937770_hetero_lr_0 0 on guest 9999

部署架构

训练conf配置

{
    "dsl_version": "2",
    "initiator": {
        "role": "guest",
        "party_id": 9999
    },
    "role": {
        "guest": [9999],
        "host": [10000],
        "arbiter": [10000]
    },
    "job_parameters": {
        "common": {
            "work_mode": 1,
            "backend": 1,
            "task_parallelism": 2,
            "computing_partitions": 8,
            "task_cores": 4
        }
    },
    "component_parameters": {
        "common": {
            "intersection_0": {
                "intersect_method": "raw",
                "sync_intersect_ids": true,
                "only_output_key": false
            },
            "hetero_lr_0": {
                "penalty": "L2",
                "optimizer": "rmsprop",
                "alpha": 0.01,
                "max_iter": 3,
                "batch_size": 320,
                "learning_rate": 0.15,
                "init_param": {
                    "init_method": "random_uniform"
                }
            }
        },
        "role": {
            "guest": {
                "0": {
                    "reader_0": {
                        "table": {"name": "breast_hetero_guest", "namespace": "experiment"}
                    },
                    "dataio_0":{
                        "with_label": true,
                        "label_name": "y",
                        "label_type": "int",
                        "output_format": "dense"
                    }
                }
            },
            "host": {
                "0": {
                    "reader_0": {
                        "table": {"name": "breast_hetero_host", "namespace": "experiment"}
                    },
                    "dataio_0":{
                        "with_label": false,
                        "output_format": "dense"
                    }
                }
            }
        }
    }
}

训练dsl配置

{
    "components" : {
        "reader_0": {
            "module": "Reader",
            "output": {
                "data": ["table"]
            }
         },
        "dataio_0": {
            "module": "DataIO",
            "input": {
                "data": {
                    "data": [
                         "reader_0.table"
                    ]
                }
            },
            "output": {
                "data": ["train"],
                "model": ["dataio"]
            },
            "need_deploy": true
         },
        "intersection_0": {
            "module": "Intersection",
            "input": {
                "data": {
                    "data": [
                        "dataio_0.train"
                    ]
                }
            },
            "output": {
                "data": ["train"]
            }
        },
        "hetero_feature_binning_0": {
            "module": "HeteroFeatureBinning",
            "input": {
                "data": {
                    "data": [
                        "intersection_0.train"
                    ]
                }
            },
            "output": {
                "data": ["train"],
                "model": ["hetero_feature_binning"]
            }
        },
        "hetero_feature_selection_0": {
            "module": "HeteroFeatureSelection",
            "input": {
                "data": {
                    "data": [
                        "hetero_feature_binning_0.train"
                    ]
                },
                "isometric_model": [
                    "hetero_feature_binning_0.hetero_feature_binning"
                ]
            },
            "output": {
                "data": ["train"],
                "model": ["selected"]
            }
        },
        "hetero_lr_0": {
            "module": "HeteroLR",
            "input": {
                "data": {
                    "train_data": ["hetero_feature_selection_0.train"]
                }
            },
            "output": {
                "data": ["train"],
                "model": ["hetero_lr"]
            }
        },
        "evaluation_0": {
            "module": "Evaluation",
            "input": {
                "data": {
                    "data": ["hetero_lr_0.train"]
                }
            },
            "output": {
                "data": ["evaluate"]
            }
        }
    }
}

备注 我们是在使用eggroll替换了spark后出现的这个问题。我们heteroLR算法调用fate-flow rest api的流程有固定脚本,以及我们使用的训练数据、训练阶段配置的conf和dsl都是从python容器中拿出来的,唯一变的地方就是backend字段从0变成1(文档说0代表eggroll,1代表spark+rabbitmq)。在原本使用eggroll时整个流程都是正常的,包括训练和预测。但是使用spark的时候在训练阶段就过不去。

dylan-fan commented 2 years ago

看起来像部分分区为空

github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open for 365 days with no activity. If this issue is still relevant or if there is new information, please feel free to update or reopen it.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 1 days since being marked as stale. If this issue is still relevant or if there is new information, please feel free to update or reopen it.