Closed wittenator closed 4 days ago
First of all, thanks for opening this PR @wittenator !
I will review this PR ASAP.
Related issue: #88
@KarhouTam I reworked the evaluation a bit more and addressed the other problems. Could you have another look? What still really makes me clueless is that simple FedAvg does not work with models bigger than lenet5. I think that it is really important that we find out where the problem exactly is, but I am slowly running out of ideas.
@KarhouTam I am back from the dead and I think that I addressed most points now. The point about the difference between evaluate/test was very fair and I implemented this a little more explicitly by letting users decide when and what to evaluate/test. I also added a .gitignore for good measure. What do you think? (Sorry for the growing PR btw^^)
@KarhouTam I condensed the .gitignore according to your example file and made the train mode testing an option.
In order to evaluate classical FL algorithms centrally and get an understanding of how well the aggregated model performs, I added a central evaluation that collects the test splits from the client. Since FedAvg did not perform on par with existing results centrally, I also found out that optimizer resetting is essential for (atleast) classical FedAvg. I added the option as a new config option. Furthermore I fixed a typo in a function across the code base as a QoL improvement.