New submissions for Mon, 13 Mar 23

Keyword: text generation

An Overview on Language Models: Recent Developments and Outlook

Authors: Chengwei Wei, Yun-Cheng Wang, Bin Wang, C.-C. Jay Kuo
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.05759
Pdf link: https://arxiv.org/pdf/2303.05759
Abstract Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner. In contrast, pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, structures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.
Keyword: machine translation

An Overview on Language Models: Recent Developments and Outlook
Authors: Chengwei Wei, Yun-Cheng Wang, Bin Wang, C.-C. Jay Kuo
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.05759
Pdf link: https://arxiv.org/pdf/2303.05759
Abstract Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner. In contrast, pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, structures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.
Keyword: non-autoregressive

There is no result

Keyword: abstractive summarization

There is no result

Keyword: factual

There is no result

Keyword: knowledge distillation

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss
Authors: Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Arxiv link: https://arxiv.org/abs/2303.05958
Pdf link: https://arxiv.org/pdf/2303.05958
Abstract This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNN-T architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.
Keyword: Hallucination

There is no result

Keyword: evaluation

An Overview on Language Models: Recent Developments and Outlook
Authors: Chengwei Wei, Yun-Cheng Wang, Bin Wang, C.-C. Jay Kuo
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.05759
Pdf link: https://arxiv.org/pdf/2303.05759
Abstract Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner. In contrast, pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, structures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.
Rewarding Chatbots for Real-World Engagement with Millions of Users
Authors: Robert Irvine, Douglas Boubert, Vyas Raina, Adian Liusie, Vineet Mudupalli, Aliaksei Korshuk, Zongyi Liu, Fritz Cremer, Valentin Assassi, Christie-Carol Beauchamp, Xiaoding Lu, Thomas Rialan, William Beauchamp
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.06135
Pdf link: https://arxiv.org/pdf/2303.06135
Abstract The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.

LuckyyySTA / arxiv-daily

New submissions for Mon, 13 Mar 23 #67

Keyword: text generation

An Overview on Language Models: Recent Developments and Outlook

Keyword: machine translation

An Overview on Language Models: Recent Developments and Outlook

Keyword: non-autoregressive

Keyword: abstractive summarization

Keyword: factual

Keyword: knowledge distillation

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

Keyword: Hallucination

Keyword: evaluation

An Overview on Language Models: Recent Developments and Outlook

Rewarding Chatbots for Real-World Engagement with Millions of Users