FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.72k stars 1.55k forks source link

run HeteroSecureBoot example failed #936

Closed better629 closed 4 months ago

better629 commented 4 years ago

I used the docker_standalone-fate-1.2.0.tar.gz to run FATE. But When I run the hetero_secureboost Binary-Class problem, there occurs a problem in secureboost_0 step.

"2020-01-10 11:55:57,178 - api_utils.py[line:83] - INFO: local api response: /v1/schedule/2020011011554180211926/secureboost_0/2020011011554180211926_secureboost_0/guest/10000/status {'retcode': 0, 'retmsg': 'success'}"
"2020-01-10 11:55:57,179 - task_executor.py[line:120] - INFO: run 2020011011554180211926 secureboost_0 2020011011554180211926_secureboost_0 guest 10000 task"
"2020-01-10 11:55:57,179 - task_executor.py[line:121] - INFO: {'BoostingTreeParam': {'tree_param': {'criterion_method': 'xgboost', 'criterion_params': [0.1], 'max_depth': 5, 'min_sample_split': 2, 'min_impurity_split': 0.001, 'min_leaf_node': 1, 'max_split_nodes': 1024, 'feature_importance_type': 'split', 'n_iter_no_change': True, 'tol': 0.001, 'use_missing': False, 'zero_as_missing': False}, 'task_type': 'classification', 'objective_param': {'objective': 'cross_entropy', 'params': [1.5]}, 'learning_rate': 0.1, 'num_trees': 5, 'subsample_feature_rate': 1, 'n_iter_no_change': True, 'tol': 0.0001, 'encrypt_param': {'method': 'paillier', 'key_length': 1024}, 'bin_num': 10, 'use_missing': False, 'zero_as_missing': False, 'encrypted_mode_calculator_param': {'mode': 'strict', 're_encrypted_rate': 1}, 'predict_param': {'threshold': 0.5}, 'cv_param': {'n_splits': 5, 'mode': 'hetero', 'role': 'guest', 'shuffle': False, 'random_seed': 103, 'need_cv': False}, 'validation_freqs': 1}, 'initiator': {'role': 'guest', 'party_id': 10000}, 'job_parameters': {'work_mode': 0, 'model_id': 'arbiter-10000#guest-10000#host-10000#model', 'model_version': '2020011011554180211926'}, 'role': {'guest': [10000], 'host': [10000], 'arbiter': [10000]}, 'config': '/fate/examples/federatedml-1.x-examples/user_config/train_conf.config_1578657341_4667', 'dsl': '/fate/examples/federatedml-1.x-examples/user_config/train_dsl.config_1578657341_1180', 'function': 'submit_job', 'local': {'role': 'guest', 'party_id': 10000}, 'CodePath': 'federatedml/tree/hetero_secureboosting_tree_guest.py/HeteroSecureBoostingTreeGuest', 'module': 'HeteroSecureBoost'}"
"2020-01-10 11:55:57,179 - task_executor.py[line:122] - INFO: {'data': {'train_data': ['intersection_0.train'], 'eval_data': ['intersection_1.eval']}}"
"2020-01-10 11:55:57,180 - hetero_secureboosting_tree_guest.py[line:293] - INFO: begin to train secureboosting guest model"
"2020-01-10 11:55:57,420 - hetero_secureboosting_tree_guest.py[line:130] - INFO: convert feature to bins"
"2020-01-10 11:55:57,428 - task_executor.py[line:132] - ERROR: list index out of range"
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/fate/eggroll/api/standalone/eggroll.py", line 264, in do_map_partitions
v = _mapper(_generator_from_cursor(cursor))
File "/fate/federatedml/feature/binning/quantile_binning.py", line 153, in approxi_quantile
QuantileBinning.insert_datas(data_instances, summary_dict, cols_dict, header, is_sparse)
File "/fate/federatedml/feature/binning/quantile_binning.py", line 173, in insert_datas
col_name = header[col_idx]
IndexError: list index out of range
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/fate/fate_flow/driver/task_executor.py", line 123, in run_task
run_object.run(parameters, task_run_args)
File "/fate/federatedml/model_base.py", line 91, in run
this_data_output = func(*params)
File "/fate/federatedml/tree/hetero_secureboosting_tree_guest.py", line 296, in fit
self.convert_feature_to_bin(data_inst)
File "/fate/federatedml/tree/hetero_secureboosting_tree_guest.py", line 138, in convert_feature_to_bin
binning_obj.fit_split_points(data_instance)
File "/fate/federatedml/feature/binning/quantile_binning.py", line 85, in fit_split_points
summary_dict = data_instances.mapPartitions(f)
File "/fate/arch/api/utils/profile_util.py", line 31, in _fn
rtn = func(*args, **kwargs)
File "/fate/arch/api/table/eggroll/table_impl.py", line 117, in mapPartitions
return DTable(self._dtable.mapPartitions(func), session_id=self._session_id)
File "/fate/eggroll/api/standalone/eggroll.py", line 771, in mapPartitions
result = r.result()
File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
IndexError: list index out of range
"2020-01-10 11:55:57,429 - api_utils.py[line:78] - INFO: local api request: http://172.19.0.3:9380/v1/schedule/2020011011554180211926/secureboost_0/2020011011554180211926_secureboost_0/guest/10000/status"
"2020-01-10 11:55:57,437 - api_utils.py[line:81] - INFO: {"retcode":0,"retmsg":"success"}

Does anyone also meet above problem~

mgqa34 commented 4 years ago

@better629 Can you give more infomation about the failed task? Like submit_conf & dsl, it will be much better to post the training data for us to reproduce the issue.

better629 commented 4 years ago

@mgqa34 The training data is the example's data.

The generated config are: train_conf

{"initiator": {"role": "guest", "party_id": 10000}, "job_parameters": {"work_mode": 0}, "role": {"guest": [10000], "host": [10000], "arbiter": [10000]}, "role_parameters": {"guest": {"args": {"data": {"train_data": [{"name": "breast_b", "namespace": "breast_b_guest"}], "eval_data": [{"name": "breast_b", "namespace": "breast_b_guest"}]}}, "dataio_0": {"with_label": [true], "label_name": ["y"], "label_type": ["int"], "output_format": ["dense"]}}, "host": {"args": {"data": {"train_data": [{"name": "breast_a", "namespace": "breast_a_host"}], "eval_data": [{"name": "breast_a", "namespace": "breast_a_host"}]}}, "dataio_0": {"with_label": [false], "output_format": ["dense"]}}}, "algorithm_parameters": {"secureboost_0": {"task_type": "classification", "learning_rate": 0.1, "num_trees": 5, "subsample_feature_rate": 1, "n_iter_no_change": true, "tol": 0.0001, "bin_num": 10, "objective_param": {"objective": "cross_entropy"}, "encrypt_param": {"method": "paillier"}, "predict_param": {"with_proba": true, "threshold": 0.5}, "cv_param": {"n_splits": 5, "shuffle": false, "random_seed": 103, "need_cv": false, "evaluate_param": {"eval_type": "binary"}}, "validation_freqs": 1}, "evaluation_0": {"eval_type": "binary"}}}

train_dsl

{"components": {"dataio_0": {"module": "DataIO", "input": {"data": {"data": ["args.train_data"]}}, "output": {"data": ["train"], "model": ["dataio"]}}, "dataio_1": {"module": "DataIO", "input": {"data": {"data": ["args.eval_data"]}, "model": ["dataio_0.dataio"]}, "output": {"data": ["eval"], "model": ["dataio"]}, "need_deploy": false}, "intersection_0": {"module": "Intersection", "input": {"data": {"data": ["dataio_0.train"]}}, "output": {"data": ["train"]}}, "intersection_1": {"module": "Intersection", "input": {"data": {"data": ["dataio_1.eval"]}}, "output": {"data": ["eval"]}, "need_deploy": false}, "secureboost_0": {"module": "HeteroSecureBoost", "input": {"data": {"train_data": ["intersection_0.train"], "eval_data": ["intersection_1.eval"]}}, "output": {"data": ["train"], "model": ["train"]}}, "evaluation_0": {"module": "Evaluation", "input": {"data": {"data": ["secureboost_0.train"]}}}}}
mgqa34 commented 4 years ago

I install "docker_standalone-fate-1.2.0.tar.gz", upload "breast_b.csv" and "breast_a.csv" with "partition=10", but can't reproduce the described issue. Is this error always occurs ?

better629 commented 4 years ago

@mgqa34 Yes, I have tried multi paramters, but always got above error with quick_run.py in examples/federatedml-1.x-examples.
image

Especially, the above error was occured in hetero_secureboost example. Did you update eval_data namespace in quick_run.py

    conf_json['role_parameters']['guest']['args']['data']['train_data'] = [
        {
            'name': guest_table_name,
            'namespace': guest_namespace
        }
    ]
    if "eval_data" in conf_json['role_parameters']['guest']['args']['data']:
        conf_json['role_parameters']['guest']['args']['data']["eval_data"] = [{
            "name": guest_table_name,
            "namespace": guest_namespace
        }]
    conf_json['role_parameters']['host']['args']['data']['train_data'] = [
        {
            'name': host_table_name,
            'namespace': host_namespace
        }
    ]
    if "eval_data" in conf_json['role_parameters']['host']['args']['data']:
        conf_json['role_parameters']['host']['args']['data']["eval_data"] = [{
            "name": host_table_name,
            "namespace": host_namespace
        }]

If not, it will occurs below error in dataio_1 node
image


"2020-01-08 05:55:16,326 - task_executor.py[line:122] - INFO: {'data': {'data': ['args.eval_data']}, 'model': ['dataio_0.dataio']}"
"2020-01-08 05:55:16,326 - data_io.py[line:123] - INFO: start to read dense data and change data to instance"
"2020-01-08 05:55:16,327 - task_executor.py[line:132] - ERROR: Count of data_instance is 0"
Traceback (most recent call last):
File "/fate/fate_flow/driver/task_executor.py", line 123, in run_task
run_object.run(parameters, task_run_args)
File "/fate/federatedml/model_base.py", line 88, in run
this_data_output = func(*real_param)
File "/fate/federatedml/util/data_io.py", line 766, in transform
return self.reader.read_data(data_inst, "transform")
File "/fate/federatedml/util/data_io.py", line 125, in read_data
abnormal_detection.empty_table_detection(input_data)
File "/fate/federatedml/util/abnormal_detection.py", line 25, in empty_table_detection
raise ValueError("Count of data_instance is 0")
ValueError: Count of data_instance is 0
``
mgqa34 commented 4 years ago

"ValueError: Count of data_instance is 0" means that the data uploaded is failed. You can upload the data directly and run the secureboost example by the guide in https://github.com/FederatedAI/FATE/tree/master/examples/federatedml-1.x-examples/hetero_secureboost

mgqa34 commented 4 years ago

"ValueError: Count of data_instance is 0" means that the data uploaded is failed. You can upload the data directly and run the secureboost example by the guide in https://github.com/FederatedAI/FATE/tree/master/examples/federatedml-1.x-examples/hetero_secureboost

better629 commented 4 years ago

@mgqa34 Yes, I have followed the federatedml-1.x-examples/hetero_secureboost Binary class. But it occurs with above error IndexError: list index out of range with quick_run.py

better629 commented 4 years ago

@mgqa34 Could you please run the hetero_secureboost Binary-Class problem with quick_run.py and breast_b/a.csv data, and take a look that if there has above problem.

BeatBoxerLrd commented 4 years ago

You can try to submit the task after modifying the table name and namespace.