Open sdwagner opened 1 year ago
Thanks for your submission, we'll assign an editor soon.
@koustuvsinha @gdetor Can any of you two edit this submission in machine learning?
@rougier I can handle this.
@gdetor Thanks!
@gdetor thank you for agreeing to handle this submission! Is there anything we can do to move this submission forward?
@tuelwer Sorry for the delay.
Hi @ogrisel and @benureau Could you please review this submission?
Any update?
Dear reviewers @ReScience/reviewers Could anybody review this submission?
I'd be interested in reviewing this submission, but I have to mention, I doubt I can rerun all the experiments due to computational constraints.
@mo-arvan Thanks and I think not re-doing everything is fine. @gdetor What do you think?
@rougier @mo-arvan I'm OK with it.
Okay, I will review this work by the end of July.
I apologize, but I have not been able to review this submission yet, should be able to write the review within the next few weeks.
Thanks. Any progress?
@mo-arvan gentle reminder
In this paper, Wagner et al. provide a reproduction report of Müller et al.'s work on label smoothing. They begin with a concise introduction to the original study and the motivations behind it. The authors then present essential details regarding the models and datasets used, noting specific variations driven by limited computational resources.
The authors have done an excellent job of providing documentation and instructions for using their released code. Their repository includes multiple Jupyter notebooks detailing the conducted experiments, along with specified dependency requirements to facilitate the setup process. To further simplify future installations, I created a Docker container as part of the review process. The files and instructions are available in my forked repository.
In their initial results, the authors examine the effect of label smoothing on model accuracy. While Müller et al. claimed that label smoothing positively impacts the test accuracy of trained models, Wagner et al. suggest that it enhances accuracy by reducing overfitting—a claim not made by the original authors. However, their results indicate mixed effects; out of eight experiments, three showed higher accuracy without label smoothing. Upon reviewing their code (https://github.com/sdwagner/re-labelsmoothing/blob/fb6c3634d2049ef7f175e7a992f109c43680fae3/datasets/datasets.py), it appears that they do not load the test set, raising the possibility that the reported results are based on the validation set. Unlike the original study, this reproduction does not include confidence intervals, and the small differences in accuracy could be attributed to randomness in the training process. Adding uncertainty analysis would significantly strengthen this work.
In the next section, the authors reproduce the results of a visualization experiment from the original study that demonstrates the effect of label smoothing on the activations of the penultimate layer and the network output. Figure 2 in their work aligns with the findings of the original study, although there is a minor discrepancy in the order of the columns in the visualization.
The authors then investigate the impact of label smoothing on Expected Calibration Error (ECE). With the exception of the results from the MNIST dataset using a fully connected network, their findings generally align with those of the original study. The reported results for training a transformer model for translation are mixed, with not all findings matching the original study. Similar to the accuracy results, the authors report findings based on the validation set, which may account for some discrepancies.
Finally, the results of the distillation experiments on fully connected networks for MNIST are consistent with the original study, though there is a slight increase in error. Ultimately, the authors confirm the observation made by Müller et al. regarding accuracy degradation in students when the teacher is trained with label smoothing. Figure 7 and 8 lack the confidence intervals present in the original study, which would have been beneficial for comparison.
Minor editing suggestions: "The authors state, that the logit dependents on the Euclidean distance" -> "The authors state that the logit depends on the Euclidean distance" "The evaluation was performed using the ECE" -> ECE should be spelled out on first use.
@mo-arvan Thank you for your report. @tuelwer @sdwagner Could you please respond to the reviewer's comments?
@mo-arvan Thank you for reviewing our submission and you thoughtful and detailed comments! @gdetor We will update our submission in the next days to incorporate the reviewer's comments.
@mo-arvan Thanks for creating a dockerfile! Feel free to open a PR to integrate it into our repository 😊
Glad you find it useful. Sure, I'll submit a pull request. I'd be happy to engage in a discussion as well.
One last minor comment, your use of vector graphics in your figures is a step up from the original publication, I'd suggest changing the color palette and the patterns to further improve the presentation of the figures, e.g. Figure 3 (b).
@mo-arvan Thanks again for your detailed comments! In the following we want to address each of the points that you raised:
Confusion validation and test data: We carefully double-checked our datasets and can confirm that all experiments were performed on the test split of each dataset:
train=False
, which corresponds to the test split (please refer to, e.g., here).train_test_split.txt
. The CUB-200-2011 dataset does not have a validation set.Uncertainty quantification: We added confidence intervals for Figure 6 and 7.
Color palette: We have chosen the colors that were used in the original work to allow easy comparison of the experimental results.
Edits: We have incorporated the proposed changes into our report.
@mo-arvan Please let me know if you agree with the responses so I can publish the paper. Thank you.
Yes, the response addresses my main concerns. I was wrong about the validation/test splits.
Original article: Rafael Müller, Simon Kornblith, and Geoffrey E. Hinton. "When does label smoothing help?." Advances in neural information processing systems 32 (2019). (https://arxiv.org/pdf/1906.02629.pdf)
PDF URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/article.pdf Metadata URL: https://github.com/sdwagner/re-labelsmoothing/blob/main/report/metadata.yaml Code URL: https://github.com/sdwagner/re-labelsmoothing
Scientific domain: Machine Learning Programming language: Python Suggested editor: Georgios Detorakis or Koustuv Sinha