New submissions for Wed, 29 Mar 23

Keyword: text generation

Synthetically generated text for supervised text analysis

Authors: Andrew Halterman
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16028
Pdf link: https://arxiv.org/pdf/2303.16028
Abstract Supervised text models are a valuable tool for political scientists but present several obstacles to their use, including the expense of hand-labeling documents, the difficulty of retrieving rare relevant documents for annotation, and copyright and privacy concerns involved in sharing annotated documents. This article proposes a partial solution to these three issues, in the form of controlled generation of synthetic text with large language models. I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text. I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier.
Keyword: machine translation

Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses
Authors: Wenshi Gu
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15587
Pdf link: https://arxiv.org/pdf/2303.15587
Abstract In the field of Japanese-Chinese translation linguistics, the issue of correctly translating attributive clauses has persistently proven to be challenging. Present-day machine translation tools often fail to accurately translate attributive clauses from Japanese to Chinese. In light of this, this paper investigates the linguistic problem underlying such difficulties, namely how does the semantic role of the modified noun affect the selection of translation patterns for attributive clauses, from a linguistic perspective. To ad-dress these difficulties, a pre-edit scheme is proposed, which aims to enhance the accuracy of translation. Furthermore, we propose a novel two-step prompt strategy, which combines this pre-edit scheme with ChatGPT, currently the most widely used large language model. This prompt strategy is capable of optimizing translation input in zero-shot scenarios and has been demonstrated to improve the average translation accuracy score by over 35%.
Hallucinations in Large Multilingual Translation Models
Authors: Nuno M. Guerreiro, Duarte Alves, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pierre Colombo, André F. T. Martins
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16104
Pdf link: https://arxiv.org/pdf/2303.16104
Abstract Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications. However, when deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in massively multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT, a general-purpose large language model~(LLM) that can be prompted for translation. Our investigation covers a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs. We provide key insights regarding the prevalence, properties, and mitigation of hallucinations, paving the way towards more responsible and reliable machine translation systems.
Keyword: non-autoregressive

There is no result

Keyword: abstractive summarization

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization
Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15621
Pdf link: https://arxiv.org/pdf/2303.15621
Abstract The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT's limitations on evaluation bias, wrong reasoning, and hallucination.
Keyword: factual

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization
Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15621
Pdf link: https://arxiv.org/pdf/2303.15621
Abstract The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT's limitations on evaluation bias, wrong reasoning, and hallucination.
Towards Countering Essentialism through Social Bias Reasoning
Authors: Emily Allaway, Nina Taneja, Sarah-Jane Leslie, Maarten Sap
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16173
Pdf link: https://arxiv.org/pdf/2303.16173
Abstract Essentialist beliefs (i.e., believing that members of the same group are fundamentally alike) play a central role in social stereotypes and can lead to harm when left unchallenged. In our work, we conduct exploratory studies into the task of countering essentialist beliefs (e.g., liberals are stupid''). Drawing on prior work from psychology and NLP, we construct five types of counterstatements and conduct human studies on the effectiveness of these different strategies. Our studies also investigate the role in choosing a counterstatement of the level of explicitness with which an essentialist belief is conveyed. We find that statements that broaden the scope of a stereotype (e.g., to other groups, as inconservatives can also be stupid'') are the most popular countering strategy. We conclude with a discussion of challenges and open questions for future work in this area (e.g., improving factuality, studying community-specific variation) and we emphasize the importance of work at the intersection of NLP and psychology.
Keyword: knowledge distillation

There is no result

Keyword: Hallucination

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization
Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15621
Pdf link: https://arxiv.org/pdf/2303.15621
Abstract The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT's limitations on evaluation bias, wrong reasoning, and hallucination.
Hallucinations in Large Multilingual Translation Models
Authors: Nuno M. Guerreiro, Duarte Alves, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pierre Colombo, André F. T. Martins
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.16104
Pdf link: https://arxiv.org/pdf/2303.16104
Abstract Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications. However, when deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns. Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in massively multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis on both the M2M family of conventional neural machine translation models and ChatGPT, a general-purpose large language model~(LLM) that can be prompted for translation. Our investigation covers a broad spectrum of conditions, spanning over 100 translation directions across various resource levels and going beyond English-centric language pairs. We provide key insights regarding the prevalence, properties, and mitigation of hallucinations, paving the way towards more responsible and reliable machine translation systems.
Keyword: evaluation

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization
Authors: Zheheng Luo, Qianqian Xie, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15621
Pdf link: https://arxiv.org/pdf/2303.15621
Abstract The performance of abstractive text summarization has been greatly boosted by pre-trained language models recently. The main concern of existing abstractive summarization methods is the factual inconsistency problem of their generated summary. To alleviate the problem, many efforts have focused on developing effective factuality evaluation metrics based on natural language inference and question answering et al. However, they have limitations of high computational complexity and relying on annotated data. Most recently, large language models such as ChatGPT have shown strong ability in not only natural language understanding but also natural language inference. In this paper, we study the factual inconsistency evaluation ability of ChatGPT under the zero-shot setting by evaluating it on the coarse-grained and fine-grained factuality evaluation tasks including binary natural language inference (NLI), summary ranking, and consistency rating. Experimental results show that ChatGPT outperforms previous SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its great potential for assessing factual inconsistency in the zero-shot setting. The results also highlight the importance of prompt design and the need for future efforts to address ChatGPT's limitations on evaluation bias, wrong reasoning, and hallucination.
Pre-training Transformers for Knowledge Graph Completion
Authors: Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji, Jianfeng Gao
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.15682
Pdf link: https://arxiv.org/pdf/2303.15682
Abstract Learning transferable representation of knowledge graphs (KGs) is challenging due to the heterogeneous, multi-relational nature of graph structures. Inspired by Transformer-based pretrained language models' success on learning transferable representation for texts, we introduce a novel inductive KG representation model (iHT) for KG completion by large-scale pre-training. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. We first pre-train iHT on a large KG dataset, Wikidata5M. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models. When further fine-tuned on smaller KGs with either entity and relational shifts, pre-trained iHT representations are shown to be transferable, significantly improving the performance on FB15K-237 and WN18RR.
Model and Evaluation: Towards Fairness in Multilingual Text Classification
Authors: Nankai Lin, Junheng He, Zhenghang Tang, Dong Zhou, Aimin Yang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.15697
Pdf link: https://arxiv.org/pdf/2303.15697
Abstract Recently, more and more research has focused on addressing bias in text classification models. However, existing research mainly focuses on the fairness of monolingual text classification models, and research on fairness for multilingual text classification is still very limited. In this paper, we focus on the task of multilingual text classification and propose a debiasing framework for multilingual text classification based on contrastive learning. Our proposed method does not rely on any external language resources and can be extended to any other languages. The model contains four modules: multilingual text representation module, language fusion module, text debiasing module, and text classification module. The multilingual text representation module uses a multilingual pre-trained language model to represent the text, the language fusion module makes the semantic spaces of different languages tend to be consistent through contrastive learning, and the text debiasing module uses contrastive learning to make the model unable to identify sensitive attributes' information. The text classification module completes the basic tasks of multilingual text classification. In addition, the existing research on the fairness of multilingual text classification is relatively simple in the evaluation mode. The evaluation method of fairness is the same as the monolingual equality difference evaluation method, that is, the evaluation is performed on a single language. We propose a multi-dimensional fairness evaluation framework for multilingual text classification, which evaluates the model's monolingual equality difference, multilingual equality difference, multilingual equality performance difference, and destructiveness of the fairness strategy. We hope that our work can provide a more general debiasing method and a more comprehensive evaluation framework for multilingual text fairness tasks.
Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics
Authors: Chengxi Li, Kai Fan, Jiajun Bu, Boxing Chen, Zhongqiang Huang, Zhi Yu
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Arxiv link: https://arxiv.org/abs/2303.15705
Pdf link: https://arxiv.org/pdf/2303.15705
Abstract Song translation requires both translation of lyrics and alignment of music notes so that the resulting verse can be sung to the accompanying melody, which is a challenging problem that has attracted some interests in different aspects of the translation process. In this paper, we propose Lyrics-Melody Translation with Adaptive Grouping (LTAG), a holistic solution to automatic song translation by jointly modeling lyrics translation and lyrics-melody alignment. It is a novel encoder-decoder framework that can simultaneously translate the source lyrics and determine the number of aligned notes at each decoding step through an adaptive note grouping module. To address data scarcity, we commissioned a small amount of training data annotated specifically for this task and used large amounts of augmented data through back-translation. Experiments conducted on an English-Chinese song translation data set show the effectiveness of our model in both automatic and human evaluation.

LuckyyySTA / arxiv-daily

New submissions for Wed, 29 Mar 23 #73

Keyword: text generation

Synthetically generated text for supervised text analysis

Keyword: machine translation

Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses

Hallucinations in Large Multilingual Translation Models

Keyword: non-autoregressive

Keyword: abstractive summarization

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

Keyword: factual

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

Towards Countering Essentialism through Social Bias Reasoning

Keyword: knowledge distillation

Keyword: Hallucination

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

Hallucinations in Large Multilingual Translation Models

Keyword: evaluation

ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization

Pre-training Transformers for Knowledge Graph Completion

Model and Evaluation: Towards Fairness in Multilingual Text Classification

Translate the Beauty in Songs: Jointly Learning to Align Melody and Translate Lyrics