New submissions for Mon, 12 Jun 23

Keyword: text generation

Towards End-to-end Speech-to-text Summarization

Authors: Raul Monteiro, Diogo Pernes
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.05432
Pdf link: https://arxiv.org/pdf/2306.05432
Abstract Speech-to-text (S2T) summarization is a time-saving technique for filtering and keeping up with the broadcast news uploaded online on a daily basis. The rise of large language models from deep learning with impressive text generation capabilities has placed the research focus on summarization systems that produce paraphrased compact versions of the document content, also known as abstractive summaries. End-to-end (E2E) modelling of S2T abstractive summarization is a promising approach that offers the possibility of generating rich latent representations that leverage non-verbal and acoustic information, as opposed to the use of only linguistic information from automatically generated transcripts in cascade systems. However, the few literature on E2E modelling of this task fails on exploring different domains, namely broadcast news, which is challenging domain where large and diversified volumes of data are presented to the user every day. We model S2T summarization both with a cascade and an E2E system for a corpus of broadcast news in French. Our novel E2E model leverages external data by resorting to transfer learning from a pre-trained T2T summarizer. Experiments show that both our cascade and E2E abstractive summarizers are stronger than an extractive baseline. However, the performance of the E2E model still lies behind the cascade one, which is object of an extensive analysis that includes future directions to close that gap.
AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization
Authors: Guan Wang, Weihua Li, Edmund M-K. Lai, Quan Bai
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.05537
Pdf link: https://arxiv.org/pdf/2306.05537
Abstract The rapid growth of information on the Internet has led to an overwhelming amount of opinions and comments on various activities, products, and services. This makes it difficult and time-consuming for users to process all the available information when making decisions. Text summarization, a Natural Language Processing (NLP) task, has been widely explored to help users quickly retrieve relevant information by generating short and salient content from long or multiple documents. Recent advances in pre-trained language models, such as ChatGPT, have demonstrated the potential of Large Language Models (LLMs) in text generation. However, LLMs require massive amounts of data and resources and are challenging to implement as offline applications. Furthermore, existing text summarization approaches often lack the ``adaptive" nature required to capture diverse aspects in opinion summarization, which is particularly detrimental to users with specific requirements or preferences. In this paper, we propose an Aspect-adaptive Knowledge-based Opinion Summarization model for product reviews, which effectively captures the adaptive nature required for opinion summarization. The model generates aspect-oriented summaries given a set of reviews for a particular product, efficiently providing users with useful information on specific aspects they are interested in, ensuring the generated summaries are more personalized and informative. Extensive experiments have been conducted using real-world datasets to evaluate the proposed model. The results demonstrate that our model outperforms state-of-the-art approaches and is adaptive and efficient in generating summaries that focus on particular aspects, enabling users to make well-informed decisions and catering to their diverse interests and preferences.
Emotion and Sentiment Guided Paraphrasing
Authors: Justin J. Xie, Ameeta Agrawal
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05556
Pdf link: https://arxiv.org/pdf/2306.05556
Abstract Paraphrase generation, a.k.a. paraphrasing, is a common and important task in natural language processing. Emotional paraphrasing, which changes the emotion embodied in a piece of text while preserving its meaning, has many potential applications, including moderating online dialogues and preventing cyberbullying. We introduce a new task of fine-grained emotional paraphrasing along emotion gradients, that is, altering the emotional intensities of the paraphrases in fine-grained settings following smooth variations in affective dimensions while preserving the meaning of the original text. We reconstruct several widely used paraphrasing datasets by augmenting the input and target texts with their fine-grained emotion labels. Then, we propose a framework for emotion and sentiment guided paraphrasing by leveraging pre-trained language models for conditioned text generation. Extensive evaluation of the fine-tuned models suggests that including fine-grained emotion labels in the paraphrase task significantly improves the likelihood of obtaining high-quality paraphrases that reflect the desired emotions while achieving consistently better scores in paraphrase metrics such as BLEU, ROUGE, and METEOR.
Keyword: machine translation

Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems
Authors: Silvia Alma Piazzolla, Beatrice Savoldi, Luisa Bentivogli
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.05882
Pdf link: https://arxiv.org/pdf/2306.05882
Abstract Machine Translation (MT) continues to make significant strides in quality and is increasingly adopted on a larger scale. Consequently, analyses have been redirected to more nuanced aspects, intricate phenomena, as well as potential risks that may arise from the widespread use of MT tools. Along this line, this paper offers a meticulous assessment of three commercial MT systems - Google Translate, DeepL, and Modern MT - with a specific focus on gender translation and bias. For three language pairs (English/Spanish, English/Italian, and English/French), we scrutinize the behavior of such systems at several levels of granularity and on a variety of naturally occurring gender phenomena in translation. Our study takes stock of the current state of online MT tools, by revealing significant discrepancies in the gender translation of the three systems, with each system displaying varying degrees of bias despite their overall translation quality.
Assisting Language Learners: Automated Trans-Lingual Definition Generation via Contrastive Prompt Learning
Authors: Hengyuan Zhang, Dawei Li, Yanran Li, Chenming Shang, Chufan Shi, Yong Jiang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.06058
Pdf link: https://arxiv.org/pdf/2306.06058
Abstract The standard definition generation task requires to automatically produce mono-lingual definitions (e.g., English definitions for English words), but ignores that the generated definitions may also consist of unfamiliar words for language learners. In this work, we propose a novel task of Trans-Lingual Definition Generation (TLDG), which aims to generate definitions in another language, i.e., the native speaker's language. Initially, we explore the unsupervised manner of this task and build up a simple implementation of fine-tuning the multi-lingual machine translation model. Then, we develop two novel methods, Prompt Combination and Contrastive Prompt Learning, for further enhancing the quality of the generation. Our methods are evaluated against the baseline Pipeline method in both rich- and low-resource settings, and we empirically establish its superiority in generating higher-quality trans-lingual definitions.
Keyword: non-autoregressive

There is no result

Keyword: abstractive summarization

Towards End-to-end Speech-to-text Summarization
Authors: Raul Monteiro, Diogo Pernes
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.05432
Pdf link: https://arxiv.org/pdf/2306.05432
Abstract Speech-to-text (S2T) summarization is a time-saving technique for filtering and keeping up with the broadcast news uploaded online on a daily basis. The rise of large language models from deep learning with impressive text generation capabilities has placed the research focus on summarization systems that produce paraphrased compact versions of the document content, also known as abstractive summaries. End-to-end (E2E) modelling of S2T abstractive summarization is a promising approach that offers the possibility of generating rich latent representations that leverage non-verbal and acoustic information, as opposed to the use of only linguistic information from automatically generated transcripts in cascade systems. However, the few literature on E2E modelling of this task fails on exploring different domains, namely broadcast news, which is challenging domain where large and diversified volumes of data are presented to the user every day. We model S2T summarization both with a cascade and an E2E system for a corpus of broadcast news in French. Our novel E2E model leverages external data by resorting to transfer learning from a pre-trained T2T summarizer. Experiments show that both our cascade and E2E abstractive summarizers are stronger than an extractive baseline. However, the performance of the E2E model still lies behind the cascade one, which is object of an extensive analysis that includes future directions to close that gap.
Keyword: factual

There is no result

Keyword: knowledge distillation

There is no result

Keyword: Hallucination

Trapping LLM Hallucinations Using Tagged Context Prompts
Authors: Philip Feldman, James R. Foulds, Shimei Pan
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.06085
Pdf link: https://arxiv.org/pdf/2306.06085
Abstract Recent advances in large language models (LLMs), such as ChatGPT, have led to highly sophisticated conversation agents. However, these models suffer from "hallucinations," where the model generates false or fabricated information. Addressing this challenge is crucial, particularly with AI-driven platforms being adopted across various sectors. In this paper, we propose a novel method to recognize and flag instances when LLMs perform outside their domain knowledge, and ensuring users receive accurate information. We find that the use of context combined with embedded tags can successfully combat hallucinations within generative language models. To do this, we baseline hallucination frequency in no-context prompt-response pairs using generated URLs as easily-tested indicators of fabricated data. We observed a significant reduction in overall hallucination when context was supplied along with question prompts for tested generative engines. Lastly, we evaluated how placing tags within contexts impacted model responses and were able to eliminate hallucinations in responses with 98.88% effectiveness.
Keyword: evaluation

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance
Authors: Qianqian Xie, Weiguang Han, Xiao Zhang, Yanzhao Lai, Min Peng, Alejandro Lopez-Lira, Jimin Huang
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2306.05443
Pdf link: https://arxiv.org/pdf/2306.05443
Abstract Although large language models (LLMs) has shown great performance on natural language processing (NLP) in the financial domain, there are no publicly available financial tailtored LLMs, instruction tuning datasets, and evaluation benchmarks, which is critical for continually pushing forward the open-source development of financial artificial intelligence (AI). This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 136K data samples to support the fine-tuning, and an evaluation benchmark with 5 tasks and 9 datasets. We first construct the large-scale multi-task instruction data considering a variety of financial tasks, financial document types, and financial data modalities. We then propose a financial LLM called FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. To support the evaluation of financial LLMs, we propose a standardized benchmark that covers a set of critical financial tasks, including five financial NLP tasks and one financial prediction task. With this benchmark, we conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks. The model, datasets, benchmark, and experimental results are open-sourced to facilitate future research in financial AI.
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Authors: Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2306.05535
Pdf link: https://arxiv.org/pdf/2306.05535
Abstract A large portion of society united around the same vision and ideas carries enormous energy. That is precisely what political figures would like to accumulate for their cause. With this goal in mind, they can sometimes resort to distorting or hiding the truth, unintentionally or on purpose, which opens the door for misinformation and disinformation. Tools for automatic detection of check-worthy claims would be of great help to moderators of debates, journalists, and fact-checking organizations. While previous work on detecting check-worthy claims has focused on text, here we explore the utility of the audio signal as an additional information source. We create a new multimodal dataset (text and audio in English) containing 48 hours of speech. Our evaluation results show that the audio modality together with text yields improvements over text alone in the case of multiple speakers. Moreover, an audio-only model could outperform a text-only one for a single speaker.
Emotion and Sentiment Guided Paraphrasing
Authors: Justin J. Xie, Ameeta Agrawal
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2306.05556
Pdf link: https://arxiv.org/pdf/2306.05556
Abstract Paraphrase generation, a.k.a. paraphrasing, is a common and important task in natural language processing. Emotional paraphrasing, which changes the emotion embodied in a piece of text while preserving its meaning, has many potential applications, including moderating online dialogues and preventing cyberbullying. We introduce a new task of fine-grained emotional paraphrasing along emotion gradients, that is, altering the emotional intensities of the paraphrases in fine-grained settings following smooth variations in affective dimensions while preserving the meaning of the original text. We reconstruct several widely used paraphrasing datasets by augmenting the input and target texts with their fine-grained emotion labels. Then, we propose a framework for emotion and sentiment guided paraphrasing by leveraging pre-trained language models for conditioned text generation. Extensive evaluation of the fine-tuned models suggests that including fine-grained emotion labels in the paraphrase task significantly improves the likelihood of obtaining high-quality paraphrases that reflect the desired emotions while achieving consistently better scores in paraphrase metrics such as BLEU, ROUGE, and METEOR.
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Qianyu He, Rui Xu, Wenhao Huang, Weiguo Zheng, Hongwei Feng, Yanghua Xiao
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2306.05783
Pdf link: https://arxiv.org/pdf/2306.05783
Abstract New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge. Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 220,000 questions and accompanied by Xiezhi-Specialty and Xiezhi-Interdiscipline, both with 15k questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management. We anticipate Xiezhi will help analyze important strengths and shortcomings of LLMs, and the benchmark is released in https://github.com/MikeGu721/XiezhiBenchmark .

LuckyyySTA / arxiv-daily

New submissions for Mon, 12 Jun 23 #106

Keyword: text generation

Towards End-to-end Speech-to-text Summarization

AaKOS: Aspect-adaptive Knowledge-based Opinion Summarization

Emotion and Sentiment Guided Paraphrasing

Keyword: machine translation

Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation Systems

Assisting Language Learners: Automated Trans-Lingual Definition Generation via Contrastive Prompt Learning

Keyword: non-autoregressive

Keyword: abstractive summarization

Towards End-to-end Speech-to-text Summarization

Keyword: factual

Keyword: knowledge distillation

Keyword: Hallucination

Trapping LLM Hallucinations Using Tagged Context Prompts

Keyword: evaluation

PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data

Emotion and Sentiment Guided Paraphrasing

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation