About the evaluation on FedML library

chaoyanghe commented 1 year ago

Dear authors.

I am Chaoyang He (https://chaoyanghe.com/), co-founder of FedML Inc. Thanks for proposing such a benchmark to compare with different frameworks. I like your summarization, but It seems many of your comments to FedML are not based on a proper feature or new versions. As your paper reader, I am confused by the version you compared with for different libraries. So it's better to mention the commit/version ID with date for each library. People then know how long the evaluation is outdated.

Please help to upgrade our library to latest version and do a fair comparison.

FedML supports:

FedML Parrot - Simulating federated learning in the real world: (1) simulate FL using a single process (2) MPI-based FL Simulator (3) Sequential Distributed Training (running arbitrary number of clients on arbitrary number of GPUS)

FedML Octopus - Cross-silo Federated Learning for cross-organization/account training, including Python-based edge SDK.

FedML Beehive - Cross-device Federated Learning for Smartphones and IoTs, including edge SDK for Android/iOS and embedded Linux.

FedML MLOps: FedML’s machine learning operation pipeline for AI running anywhere and at any scale.

You can find an overview of new FedML at: https://medium.com/@FedML/fedml-ai-platform-releases-the-worlds-federated-learning-open-platform-on-public-cloud-with-an-8024e68a70b6

FedML Homepage: https://fedml.ai/

FedML Open Source: https://github.com/FedML-AI

FedML Platform: https://open.fedml.ai/ (you mentioned our library does not support deployment... this is not true)

FedML Use Cases: https://open.fedml.ai/platform/appStore

FedML Documentation: https://doc.fedml.ai/ (you mentioned our library does not support doc... this is not true)

FedML Blog: https://medium.com/@FedML

FedML Research: https://fedml.ai/research-papers/

In addition, I have some questions for the selection web.

As some libraries iterate quickly (e.g., we release new versions weekly), how can you upgrade the version and guarantee the selection is in a timely manner?
it would be better to provide multiple answers for each selection since many libraries support the same feature.
how to help users find libraries that support multiple features? Users normally needs not just one feature, especially the case that first doing POC (simulation) and then migrate to deployment without code change for deployment (we support this).

This is important to business development, otherwise it may raise issues for startup like us (e.g., narrowing down our business scope due to not upgrading timely, misguide users to libraries that may not be the most helpful to users since they may need multiple features...)

We are happy to answer any questions you may meet, and good luck to your paper submission.

camelop commented 1 year ago

Hi Chaoyang, thank you for contacting us and for your great work, in bringing the FedML framework to the FL community!

Our evaluation is based on FedML commit 2ee0517a7fa9ec7d6a5521fbae3e17011683eecd (as part of the revision, we’re providing commit version information in our latest version of the paper). While the paper targets to provide a fair snapshot view comparison across different FL frameworks, we are looking forward to collaborating with maintainers of FedML and other frameworks to update the benchmark result continuously. We believe such an approach is similar to other existing ML benchmarks that provide evaluation results based on versioning.

We will update the FedML version as suggested, and continue to periodically check the latest and representative FL frameworks and include them in the UniFed. As relevant evidence, we’ve already been including more results from two new frameworks in our revision. We believe our paper will help standardize the evaluation of FL frameworks, drawing attention from more and more real-world FL practitioners. With foreseeable direct collaborations with maintainers from FedTree, FedML, and other frameworks on building the standard together, we believe it is an important step towards the long-term healthy development of the FL community.

For your questions specifically:

The suggestion is based on a certain version of the FL framework. While we plan to update it periodically on our end, if there is a specific entry (for features/performance numbers) that is severely misaligned with the latest version, we're more than willing to start a discussion with your team to test the latest version and update our benchmark result accordingly.
The purpose of the tool is to find a single final result that matches the need the most, and all selections are backed up by the result in the paper. We do note that (1) the selection is based on the off-the-shelf framework in a specific version, and (2) the result is suggestive and is not suitable for scenarios when the user is considering customizations. Additionally, we refer to https://unifedbenchmark.github.io/leaderboard/index.html#scenario_1_heading for users who want to get a list of candidates that support certain features, and https://unifedbenchmark.github.io/leaderboard/index.html#scenario_9_heading for more details on the decision procedure of the tool.
The seamless migration from development to deployment is certainly a very important usability feature and it is a very interesting problem to have a clear definition of this attribute. Thank you for providing the idea and would you like to start a conversation on your understanding of this matter?

Again, thank you for contacting us and we are looking forward to working together to reflect the latest development of FedML in our benchmark.

chaoyanghe commented 1 year ago

@camelop thanks for your reply. Your vision is big and good, but I feel the current writing does not match your vision accurately. For FedML, we do suggest you to upgrade your evaluation ASAP. The commit ID you provided is super outdated. I am open to set more meetings with you guys. You mentioned that one of your goals is to guide practioners in FL, if so, you need to involve more evaluation metrics in industry level, including user experience, API simplification and flexibility, rich of app ecosystem, system performance, user statistics (user number is always the best voting for product strengths), and many more. Here is a blog for you to have a taste of what I mean: https://medium.com/@FedML/fedml-ai-platform-releases-the-worlds-federated-learning-open-platform-on-public-cloud-with-an-8024e68a70b6.

I wish you continue to polish this work to match your vision and goal. Current version is still too research-oriented with simple metrics that may not be relevant to landing FL into reality.

AI-secure / FLBenchmark-toolkit

About the evaluation on FedML library #2