FederatedAI / FedLCM

A web application that manages lifecycles of federated learning federations.
Apache License 2.0
19 stars 8 forks source link

installation of openfl director keeps failing (k8s deployment) #36

Open kta-intel opened 1 year ago

kta-intel commented 1 year ago

I am trying to deploy OpenFL on FedLCM. I set LIFECYCLEMANAGER_EXPERIMENT_ENABLED to true in the k8s_deploy.yaml for the backend.

I followed the instructions listed here: https://github.com/FederatedAI/FedLCM/blob/main/doc/OpenFL_Guide.md but the installation of the director keeps failing. I am unsure how to troubleshoot. Do you have any insights, or do you have advice for setting the director parameters?

This is the error description:

failed to install openfl director, error: job is Failed, job info: &{93231661-4d62-49a6-88d0-50fd70788bc8 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC ClusterInstall ef50e111-9122-4a41-b22d-eac5525862b9 admin map[director:{director Running Undefined 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC} notebook:{notebook Running Undefined 2023-03-29 21:32:39.121 +0000 UTC 0001-01-01 00:00:00 +0000 UTC}] Failed 1h0m0s 0xc0005602a0 [update job status to Running create Cluster in DB Success overwrite current installation helm install Success checkout Cluster status [3362] checkout Cluster status timeOut!] {3 2023-03-29 21:32:39.122 +0000 UTC 2023-03-29 22:32:40.015 +0000 UTC {0001-01-01 00:00:00 +0000 UTC false}}}

Thank you.

wfangchi commented 1 year ago

Thanks for using FedLCM's OpenFL support! This is mostly likely because we currently keep the FedLCM's OpenFL container image in a private registry. We are exploring options on how to make it public, which can be discussed during OpenFL's community meeting next week. I will update here once we reach a decision.

kta-intel commented 1 year ago

Great, thanks for the information and I appreciate your quick response! I will await your update. In the meantime, I actually support OpenFL efforts from the Intel side, so let me know if there's anything I may to do help in this process

wfangchi commented 1 year ago

The OpenFL community kindly offered to create a new org account in Docker Hub where we can host our images. We will do that once the org account is set up.

In the mean time, we provide a way to build the image locally and use any customized registries. You can use the current develop-v0.3.0 branch to test the FedLCM and follow this section to build the image: https://github.com/FederatedAI/FedLCM/blob/develop-0.3.0/doc/OpenFL_Guide.md#preparing-the-fedlcm-openfl-image-locally--using-you-own-registry . This image is based on OpenFL v1.5 release that must be used by FedLCM v0.3 (which is currently under development but the OpenFL support is completed.)

If you are interested you can have a try with this approach. Or when v0.3.0 released, we will upload the image and use the future OpenFL Docker Hub org account as the default registry address.

kta-intel commented 1 year ago

Sorry, I somehow missed your response. Thank you so much for your help. I will give this a try.