DRSY / EMO

[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
111 stars 13 forks source link

Is it possible for you to cite "Learning with a Wasserstein Loss" in the camera-ready version? #9

Closed YouJiacheng closed 7 months ago

YouJiacheng commented 7 months ago

Dear authors: Thank you for your great work advancing the frontier of language model training.

Learning with a Wasserstein Loss (arxiv) Learning with a Wasserstein Loss (NeurIPS 2015)

This paper represents the first use of the Wasserstein distance (i.e. earth mover’s distance) as a loss for supervised learning. It considers the problem of learning to predict a non-negative measure over a finite set. Language Models are essentially learning to predict a non-negative measure over a finite set.

In summary, Learning with a Wasserstein Loss has used a similar method to solve a similar problem. The core idea and core technique are the same, the problem is the same in principle.

Nonetheless, your great work proposes a tractable and effective upper bound for EMD and verifies EMD's effectiveness in language model fine-tuning, which is nontrivial and impressive.

Could you please, by any chance, cite Learning with a Wasserstein Loss in the camera-ready version paper? I believe that will help readers to find related works.

本来想在OpenReview发个Public Comment,但现在不能发了😂

DRSY commented 7 months ago

感谢您提供的reference,我们会在camera-ready版本中添加相应的citation。