FederatedAI / KubeFATE

Manage federated learning workload using cloud native technologies.
Apache License 2.0
423 stars 221 forks source link

v1.9.0 (FATE on Spark):After stopping the Spark-master and Spark-Work containers, the federated tasks can still run normally #782

Closed FranisiL closed 2 years ago

FranisiL commented 2 years ago

The deployment process has used the environment and other information to refer to: https://github.com/FederatedAI/KubeFATE/issues/778#issue-1415860892

To Reproduce

After stopping the Spark-master and Spark-Work containers, the federated tasks can still run normally, which is not normal in my understanding

image image
JingChen23 commented 2 years ago

你是下面两个意思中的哪一个?

  1. 任务真成功了
  2. 任务没成功但是fateboard上显示成功了

可以查看一下evaluation_0模块的具体输出,判断是真成功还是假成功。

FranisiL commented 2 years ago

你是下面两个意思中的哪一个?

  1. 任务真成功了
  2. 任务没成功但是fateboard上显示成功了

可以查看一下evaluation_0模块的具体输出,判断是真成功还是假成功。

The first one, the task is really successful

JingChen23 commented 2 years ago

This is obviously impossible, please double check your environment. I have just tried and the first step, reader_0, will fail.