FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.65k stars 1.55k forks source link

Fate on spark部署时出现FEDERATED_ERROR #4146

Closed ykcirh closed 2 months ago

ykcirh commented 2 years ago

用spark作为计算存储引擎,fate-python容器里service_conf.yaml配置如下(其余均用默认配置):

fateflow:
  host: 192.167.0.100
  http_port: 9380
  grpc_port: 9360
  http_app_key:
  http_secret_key:
  proxy: nginx
  protocol: http
fate_on_spark:
  spark:
    # default use SPARK_HOME environment variable
    home: /data/project/common/spark-3.3.0-bin-hadoop3
    cores_per_node: 20
    nodes: 2
  linkis_spark:
    cores_per_node: 20
    nodes: 2
    host: 10.0.50.61
    port: 8088
    token_code: MLSS
    python_path: /opt/app-root/bin/python
  hive:
    host: 127.0.0.1
    port: 10000
    auth_mechanism:
    username:
    password:
  linkis_hive:
    host: 127.0.0.1
    port: 9001
  hdfs:
    name_node: hdfs://10.0.50.61:9870/
    # default /
    path_prefix:
  rabbitmq:
    host: 10.0.50.61
    mng_port: 15672
    port: 5672
    user: fate
    password: fate
    # default conf/rabbitmq_route_table.yaml
    route_table:
  pulsar:
    host: 192.168.0.5
    port: 6650
    mng_port: 8080
    cluster: standalone
    # all parties should use a same tenant
    tenant: fl-tenant
    # message ttl in minutes
    topic_ttl: 5
    # default conf/pulsar_route_table.yaml
    route_table:
  nginx:
    host: 10.0.50.61
    http_port: 80
    grpc_port: 9310

ngnix容器里/data/projects/fate/proxy/nginx/conf/route_table.yaml配置如下:

default:
  proxy:
    - host: 10.0.50.61
      http_port: 80
9999:
  proxy:
    - host: 10.0.50.61
      http_port: 80
  fateflow:
    - host: 10.0.50.61
      http_port: 9380
10000:
  proxy:
    - host: 10.0.50.63
      http_port: 80
  fateflow:
    - host: 127.0.0.1
      http_port: 9380

可以上传数据,但是发起任务的时候出现如下错误: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmeg': 'Federated schedule error, Expecting value: line 1 column 1 (char 0)'} 请问这是什么原因呢?是什么配置还需要修改么?谢谢~

BiancaZYCao commented 2 years ago

Nginx 网络代理的问题 如果你没有改动nginx & openresty的配置的话 http_port 应该为 9300 参考这边 #3817

ykcirh commented 2 years ago

改动nginx端口为80,所以把9300也改成了80,这样做是不可以的么?

BiancaZYCao commented 2 years ago

没有尝试过修改为80端口, 我不太了解Nginx 具体细节 这里还额外使用了lua脚本(在nginx/lua目录下) 启动虚拟主机的9300端口 建议检查涉及到的路由配置,比如 nginx/conf/nginx.conf (默认 9128), nginx/conf/vhost/coordination_http_proxy.conf (默认9300)

修改完执行/data/projects/fate/proxy/nginx/sbin/nginx -t 并重启服务

github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open for 365 days with no activity. If this issue is still relevant or if there is new information, please feel free to update or reopen it.