FederatedAI / FATE-Flow

Solution for deploying and managing end-to-end federated learning workflows
Apache License 2.0
50 stars 45 forks source link

Add specific directory to store data to upload #558

Open asdfsx opened 5 months ago

asdfsx commented 5 months ago

System information

Describe the feature and the current behavior/state.

Add a specific directory to store data to upload, the directory can be confined, and fate-flow search data to upload in this directory.

Any Other info.

I install fate-flow with python setup.py install, and install fate-flow to /data/projects/fate/venv/lib/python3.8/site-packages/ Then I upload data using command flow data upload -c examples/upload/upload_host.json, the content of upload_host.json

{
  "file": "examples/data/breast_hetero_host.csv",
  "head": true,
  "partitions": 16,
  "extend_sid": true,
  "meta": {
    "delimiter": ",",
    "match_id_name": "id"
  },
  "namespace": "experiment",
  "name": "breast_hetero_host"
}

But the upload job failed. In fateboard, the job's log shows that fate-flow search data file in/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg

[ERROR][2024-01-16 03:51:53,303][1518][_wraps.run][line:87]: Traceback (most recent call last):
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/entrypoint/cli.py", line 90, in execute
    io_meta = execute_component(task_config)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/entrypoint/cli.py", line 121, in execute_component
    component.execute(config, outputs)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/cpn.py", line 40, in execute
    return self.callback(config, outputs)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/components/upload.py", line 36, in upload
    upload_data(config, outputs)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/components/upload.py", line 46, in upload_data
    data = upload_object.run(
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/components/upload.py", line 203, in run
    data_table_count = self.save_data_table(job_id)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/components/upload.py", line 215, in save_data_table
    input_feature_count = self.get_count(input_file)
  File "/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/components/components/upload.py", line 307, in get_count
    with open(input_file, "r", encoding="utf-8") as fp:
FileNotFoundError: [Errno 2] No such file or directory: '/data/projects/fate/venv/lib/python3.8/site-packages/fate_flow-2.0.0-py3.8.egg/fate_flow/examples/data/breast_hetero_host.csv'
asdfsx commented 5 months ago

By the way, I think the config file also can be separated from the project. Right now fate_flow_server searching for the config file stored in the directory {FATE_FLOW_PROJECT}/conf. Why not add an environment variable such as FATEFLOW_CONFPATH to specify where the config stored. Or add the parameter for fate_flow_server such as --config to specify the config when fate_flow start up

zhihuiwan commented 5 months ago

Thank you for your feedback. Currently, examples need to be manually pulled to the local directory. We will also optimize the configuration directory you mentioned in the future.