New submissions for Thu, 23 Mar 23

Keyword: text generation

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

Authors: Dhaval Taunk, Shivprasad Sagare, Anupam Patil, Shivansh Subramanian, Manish Gupta, Vasudeva Varma
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.12308
Pdf link: https://arxiv.org/pdf/2303.12308
Abstract Lack of encyclopedic text contributors, especially on Wikipedia, makes automated text generation for \emph{low resource (LR) languages} a critical problem. Existing work on Wikipedia text generation has focused on \emph{English only} where English reference articles are summarized to generate English Wikipedia pages. But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem. Hence, in this work, we propose \task{}, which is the task of cross-lingual multi-document summarization of text from multiple reference articles, written in various languages, to generate Wikipedia-style text. Accordingly, we contribute a benchmark dataset, \data{}, spanning $\sim$69K Wikipedia articles covering five domains and eight languages. We harness this dataset to train a two-stage system where the input is a set of citations and a section title and the output is a section-specific LR summary. The proposed system is based on a novel idea of neural unsupervised extractive summarization to coarsely identify salient information followed by a neural abstractive model to generate the section-specific text. Extensive experiments show that multi-domain training is better than the multi-lingual setup on average.
Keyword: machine translation

There is no result

Keyword: non-autoregressive

MEGA: Multilingual Evaluation of Generative AI
Authors: Kabir Ahuja, Rishav Hada, Millicent Ochieng, Prachi Jain, Harshita Diddee, Samuel Maina, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, Sunayana Sitaram
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.12528
Pdf link: https://arxiv.org/pdf/2303.12528
Abstract Generative AI models have impressive performance on many Natural Language Processing tasks such as language understanding, reasoning and language generation. One of the most important questions that is being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative Large Language Models (LLMs) are restricted to English and it is unclear how capable these models are at understanding and generating other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 8 diverse tasks and 33 typologically diverse languages. We also compare the performance of generative LLMs to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and discuss some of the reasons why generative LLMs are currently not optimal for all languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.
Keyword: abstractive summarization

There is no result

Keyword: factual

There is no result

Keyword: knowledge distillation

There is no result

Keyword: Hallucination

There is no result

Keyword: evaluation

Understand Legal Documents with Contextualized Large Language Models
Authors: Xin Jin, Yuchen Wang
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2303.12135
Pdf link: https://arxiv.org/pdf/2303.12135
Abstract The growth of pending legal cases in populous countries, such as India, has become a major issue. Developing effective techniques to process and understand legal documents is extremely useful in resolving this problem. In this paper, we present our systems for SemEval-2023 Task 6: understanding legal texts (Modi et al., 2023). Specifically, we first develop the Legal-BERT-HSLN model that considers the comprehensive context information in both intra- and inter-sentence levels to predict rhetorical roles (subtask A) and then train a Legal-LUKE model, which is legal-contextualized and entity-aware, to recognize legal entities (subtask B). Our evaluations demonstrate that our designed models are more accurate than baselines, e.g., with an up to 15.0% better F1 score in subtask B. We achieved notable performance in the task leaderboard, e.g., 0.834 micro F1 score, and ranked No.5 out of 27 teams in subtask A.
Can we trust the evaluation on ChatGPT?
Authors: Rachith Aiyappa, Jisun An, Haewoon Kwak, Yong-Yeol Ahn
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2303.12767
Pdf link: https://arxiv.org/pdf/2303.12767
Abstract ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from Human Feedback (RLHF). We highlight the issue of data contamination in ChatGPT evaluations, with a case study of the task of stance detection. We discuss the challenge of preventing data contamination and ensuring fair model evaluation in the age of closed and continuously trained models.
Interpretable Bangla Sarcasm Detection using BERT and Explainable AI
Authors: Ramisa Anan, Tasnim Sakib Apon, Zeba Tahsin Hossain, Elizabeth Antora Modhu, Sudipta Mondal, MD. Golam Rabiul Alam
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Arxiv link: https://arxiv.org/abs/2303.12772
Pdf link: https://arxiv.org/pdf/2303.12772
Abstract A positive phrase or a sentence with an underlying negative motive is usually defined as sarcasm that is widely used in today's social media platforms such as Facebook, Twitter, Reddit, etc. In recent times active users in social media platforms are increasing dramatically which raises the need for an automated NLP-based system that can be utilized in various tasks such as determining market demand, sentiment analysis, threat detection, etc. However, since sarcasm usually implies the opposite meaning and its detection is frequently a challenging issue, data meaning extraction through an NLP-based model becomes more complicated. As a result, there has been a lot of study on sarcasm detection in English over the past several years, and there's been a noticeable improvement and yet sarcasm detection in the Bangla language's state remains the same. In this article, we present a BERT-based system that can achieve 99.60\% while the utilized traditional machine learning algorithms are only capable of achieving 89.93\%. Additionally, we have employed Local Interpretable Model-Agnostic Explanations that introduce explainability to our system. Moreover, we have utilized a newly collected bangla sarcasm dataset, BanglaSarc that was constructed specifically for the evaluation of this study. This dataset consists of fresh records of sarcastic and non-sarcastic comments, the majority of which are acquired from Facebook and YouTube comment sections.

LuckyyySTA / arxiv-daily

New submissions for Thu, 23 Mar 23 #71

Keyword: text generation

XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages

Keyword: machine translation

Keyword: non-autoregressive

MEGA: Multilingual Evaluation of Generative AI

Keyword: abstractive summarization

Keyword: factual

Keyword: knowledge distillation

Keyword: Hallucination

Keyword: evaluation

Understand Legal Documents with Contextualized Large Language Models

Can we trust the evaluation on ChatGPT?

Interpretable Bangla Sarcasm Detection using BERT and Explainable AI