Is it possible for you to cite "Learning with a Wasserstein Loss" in the camera-ready version?

Dear authors: Thank you for your great work advancing the frontier of language model training.

Learning with a Wasserstein Loss (arxiv) Learning with a Wasserstein Loss (NeurIPS 2015)

This paper represents the first use of the Wasserstein distance (i.e. earth mover’s distance) as a loss for supervised learning. It considers the problem of learning to predict a non-negative measure over a finite set. Language Models are essentially learning to predict a non-negative measure over a finite set.

In summary, Learning with a Wasserstein Loss has used a similar method to solve a similar problem. The core idea and core technique are the same, the problem is the same in principle.

Nonetheless, your great work proposes a tractable and effective upper bound for EMD and verifies EMD's effectiveness in language model fine-tuning, which is nontrivial and impressive.

Could you please, by any chance, cite Learning with a Wasserstein Loss in the camera-ready version paper? I believe that will help readers to find related works.

本来想在OpenReview发个Public Comment，但现在不能发了😂

DRSY / EMO

Is it possible for you to cite "Learning with a Wasserstein Loss" in the camera-ready version? #9