jafermarq commented 1 year ago

FedAUX

Title: FEDAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning
Venue: IEEE TNNLS 2021
Link to paper: https://ieeexplore.ieee.org/document/9632275

Do you want to work on this baseline?

🌻 Check everything about the Summer of Reproducibility on flower.dev/summer

All available baselines are listed in the Summer of Reproducibility Dashboard and also in the GitHub Issues with the summer-of-reproducibility label. The content is the same.

📝 It is advised to complete these steps before your start working on your code. But if you can't wait to implement your baseline with Flower (we totally understand it 😄), please ensure you follow the steps on how to contribute a new baseline.

What follows are the steps 1 & 2 in the Summer of Reproducibility instructions.

1. Join the Summer of Reproducibility program

[x] Join the Flower Slack and say "hi! 👋" in the channel #summer-of-reproducibility.
[x] Pick a baseline from our curated list <---------------------------------------- [you are doing this now]
2. Define the scope of your contribution
[x] What are you going to reproduce? Add a comment to your issue and tell us about your plan regarding this baseline: what experiments from the paper are you reproducing?, for which datasets ? the more details you provide us with the better !
[x] Check if you are eligible for a reward.

As we have to comply with US/EU regulations, we have checked that individual contributors based on these countries or territories are eligible: Australia, Austria, Belgium, Bulgaria, Canada, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Gibraltar, Greece, Hong Kong SAR China, Hungary, India, Ireland, Italy, Japan, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Romania, Singapore, Slovakia, Slovenia, South Korea, Spain, Sweden, Switzerland, Thailand, United Arab emirates, United Kingdom United States.

If where you are based is not on the list, please send us an email (summer@flower.dev) letting us know a bit about yourself (where are you currently based?, are you a university student? do you work at a public institution?). Please tell us the baselines you are interested in implementing (i.e. tell us your GitHub issue if you have crated one). We will reach back to you.
[x] We will discuss with you about your contribution plan, if it sounds like a substantial enough contribution according to the Summer of Reproducibility rules (check our website flower.dev/summer), you'll get the OK from us to start working on your baseline!

What happens next?

[x] This item will be moved to the In Progress stage by a member of the Flower Team.
[ ] Follow the instructions for creating a new baseline which will guide you through the process step-by-step.

Is something wrong or not clear ?

Ask a question directly in your issue.
Reach out to us via the Flower Slack and ask your question in the #summer-of-reproducibility channel
Check all the details (including FAQ) in the Summer of Reproducibility website: flower.dev/summer

Rishi0812 commented 1 year ago

Hello @jafermarq,

Rishi this side and I'm excited to take up this issue and implement the FedAUX using flower. My approach to implementing FedAUX, which leverages Unlabeled Auxiliary Data in Federated Learning, involves the following steps:

• Prepare and preprocess the datasets 
    - Image: (CIFAR-10, CIFAR-100, and STL-10). CIFAR-100 and SVHN as well as different subsets of ImageNet (Mammals, Birds, Dogs, Devices, Invertebrates, Structures) as auxiliary data.
    - *Text: Tiny-Bert on the AG-NEWS and Multilingual Amazon Reviews Corpus and BookCorpus as auxiliary data
    - Split the data among multiple n clients based using Dirichlet distribution

• Define the neural network and model architectures (MobileNetV2, ShuffleNet, and ResNet8 [for images] Tiny-Bert [for text]) using PyTorch. These models will be trained and aggregated in the Federated Learning process

• Use the Flower library to set up the Federated Learning environment
    - Server Setup: Initialize the Flower server and define the aggregation algorithm, evaluation mechanisms, and communication protocols.
    - Client Setup: Define the Flower client and implement the training process for individual clients. Clients will participate in the training process. (with upto 100 clients as described in the paper)

• Flower Client and Server Communication: Utilize Flower's communication API to handle model updates and synchronize models between clients and the server. communicate with the server receiving and sending the weights, evals and params back and fourth.

• Training Loop: Execute the training loop on clients using the Flower client API and using Privacy Setting parameters as described in paper. Clients will communicate with the server to send and receive model updates.

• Tracking and Logging: Implement evaluation mechanisms, logging and tracking features to monitor the progress of experiments and save experimental results.
    - Hyperparameter Tuning - explore various hyperparameter configurations and analyze the performance of different setups throughput training
    - Final model results

I aim to re-produce fig 3, 4 and 5 from the paper using flower.

urwithajit9 commented 1 year ago

Hi, As Rishi has initiated this, I will wait for confirmation and go through the paper and decide my course of action. I wanted to reproduce all the experiments because I am woking on the same topic.

Thanks.

Rishi0812 commented 1 year ago

Hey,

Yup, Even I plan to reproduce all of the experiments in the paper, have included all the datasets for the same in my approach, Which corresponds to reproducing almost all fig 3 (image), 4 (text) and 5 (eval).

jafermarq commented 1 year ago

Hi @Rishi0812, thanks for the detailed comment. Reproducing Figures 3,4,5 is good but, aren't Figure 6 and 7 the ones that actually showcase the performance of FedAUX? For the Summer of Reproducibility the aim is to reproduce the main results of the paper (which in my view corresponds to Figure 6 & 7). Note that reproducing those experiments for all the networks and settings is not strictly required. But we do expect contributors to include at least two datasets and a "point of reference" to the baseline they are implementing (for example, having the FedAvg results alongside the FedAUX curves).

Let me know what your thoughts are!

Rishi0812 commented 1 year ago

Hey,

Yup, You are right, I had it in my training procedure of changing and experimenting with the privacy parameters but forgot to include the figures. I'll surely work with different privacy settings and re-produce the fig 6 and 7 too.

And I am planning to work mainly on the image datasets and architectures first as I feel a lot of emphasis has been put on these image training procedures and evaluations in the paper. And at least try to cover one of the NLP experiments. I'm planning to keep my main focus on MobileNetv2 and Shufflenet architectures with the image datasets as it's almost used in all the evaluation parameters and comparisons and this is the primary thing I plan on shipping. And if time permits will definitely try to work and experiment with the rest.

I'm open to improvisation and any refinement of the plan if required. Please do let me know your thoughts on this @jafermarq

Thanks

jafermarq commented 1 year ago

ok. Then could we summarise the contribution plan as:

Reproduce results in Figure 6 for MobileNetv2 and Shufflenet including FedAvg or FedDF or FedDF+P (up to you which baseline to include alongside FedAUX but you need one)
Reproduce the TinyBert results for AG_NEWS dataset (much smaller than the AMAZON one.) -- we expect contributors in the Summer of Reproducibility to include two datasets in their evaluation (see the FAQ in flower.dev/summer). If you think this would be too much work, then reproducing a subset of Table 3 would be fine (essentially using other auxiliary data instead of STL-10)
About the privacy experiments: do you mean reproducing Figure 8?

Let me know what you think. And also, whether you are based in an eligible country from the list shown in the description of the Github issue (no need to tell me where you are, just a "yes" is enough. If your location is not on the list but you think you should be eligible, please read the description and reach out to us by email)

Rishi0812 commented 1 year ago

Hey @jafermarq,

The plan looks amazing.

Yup will be using any of the reference baselines for the experiment and AG_News Dataset sounds good or the Subset of Table 3. I meant more about Fig 6 as it's almost similar to the fig 8 but more concise version.

And yes I'm from one of the eligible countries. Excited to contribute and look forward to it!

jafermarq commented 1 year ago

@Rishi0812 , ok I understand now. Figure 6 is fine, you can leave Figure 8 as an extension. I wasn't sure why you were referring to Figure 6 as part of the "privacy setting". All clear now :)

Then everything is set. I have ✅ all points in Step 1 &2 above, added you as the assignee to this issue and moved it to In Progress state. You can see the steps on how to start with the code following the link in the What happens Next? section above. If you have some doubts or suggestions while you work on FedAUX, you are very welcome to interact with me an other contributors in the Flower Slack channel. Just as a small reminder, the Summer of Reproducibility ends at the end of September, so all baselines should be completed by then. There is time but it's better to start early!

Looking forward to seeing your FedAUX baseline in action! 🙌

Rishi0812 commented 1 year ago

Great to hear, Thanks a lot for the opportunity. Looking forward to giving awesome contributions to Flower!

jafermarq commented 1 year ago

Hi @Rishi0812 ! This is just a gentle reminder that the Flower Summer of Reproducibility is ending at the end of the month. With just a little more than 3 weeks to go, we are excited to see quite a few baselines well ahead in the process with their respective PRs close to ready. If your PR is already on the list, great !! Please make sure the PR is linked to this issue (you just need to copy the URL of this issue somewhere in the main message of your PR). Ping me when you'd like me to take a look.

Also, make sure you keep an eye:eyes: on the #summer-of-reproducibility channel in the Flower Slack. I’ll announce very soon a new (the third!) round of 1:1 ask-me-anything sessions to help Summer of Reproducibility contributors like yourself to meet the deadline. Please consider booking a time slot if you want to chat with me about your baseline, potential issues you have making your code run, how to open a PR, doubts about what to include in your readme, how to use Hydra configs more effective, etc … all questions are welcome!!

adap / flower

FedAUX #2043

FedAUX

Do you want to work on this baseline?

1. Join the Summer of Reproducibility program

2. Define the scope of your contribution

What happens next?

Is something wrong or not clear ?