New submissions for Mon, 10 Apr 23

Keyword: text generation

There is no result

Keyword: machine translation

There is no result

Keyword: non-autoregressive

There is no result

Keyword: abstractive summarization

There is no result

Keyword: factual

Interpretable Unified Language Checking

Authors: Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.03728
Pdf link: https://arxiv.org/pdf/2304.03728
Abstract Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims to check if language input is factual and fair. While fairness and fact-checking tasks have been handled separately with dedicated models, we find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks with a simple, few-shot, unified set of prompts. With the ``1/2-shot'' multi-task language checking method proposed in this work, the GPT3.5-turbo model outperforms fully supervised baselines on several language tasks. The simple approach and results suggest that based on strong latent knowledge representations, an LLM can be an adaptive and explainable tool for detecting misinformation, stereotypes, and hate speech.
Keyword: knowledge distillation

There is no result

Keyword: Hallucination

There is no result

Keyword: evaluation

ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking about
Authors: Aman Rangapur, Haoran Wang
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.03325
Pdf link: https://arxiv.org/pdf/2304.03325
Abstract Large language models have gained considerable interest for their impressive performance on various tasks. Among these models, ChatGPT developed by OpenAI has become extremely popular among early adopters who even regard it as a disruptive technology in many fields like customer service, education, healthcare, and finance. It is essential to comprehend the opinions of these initial users as it can provide valuable insights into the potential strengths, weaknesses, and success or failure of the technology in different areas. This research examines the responses generated by ChatGPT from different Conversational QA corpora. The study employed BERT similarity scores to compare these responses with correct answers and obtain Natural Language Inference(NLI) labels. Evaluation scores were also computed and compared to determine the overall performance of GPT-3 \& GPT-4. Additionally, the study identified instances where ChatGPT provided incorrect answers to questions, providing insights into areas where the model may be prone to error.
On the Evaluations of ChatGPT and Emotion-enhanced Prompting for Mental Health Analysis
Authors: Kailai Yang, Shaoxiong Ji, Tianlin Zhang, Qianqian Xie, Sophia Ananiadou
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.03347
Pdf link: https://arxiv.org/pdf/2304.03347
Abstract Automated mental health analysis shows great potential for enhancing the efficiency and accessibility of mental health care, whereas the recent dominant methods utilized pre-trained language models (PLMs) as the backbone and incorporated emotional information. The latest large language models (LLMs), such as ChatGPT, exhibit dramatic capabilities on diverse natural language processing tasks. However, existing studies on ChatGPT's zero-shot performance for mental health analysis have limitations in inadequate evaluation, utilization of emotional information, and explainability of methods. In this work, we comprehensively evaluate the mental health analysis and emotional reasoning ability of ChatGPT on 11 datasets across 5 tasks, including binary and multi-class mental health condition detection, cause/factor detection of mental health conditions, emotion recognition in conversations, and causal emotion entailment. We empirically analyze the impact of different prompting strategies with emotional cues on ChatGPT's mental health analysis ability and explainability. Experimental results show that ChatGPT outperforms traditional neural network methods but still has a significant gap with advanced task-specific methods. The qualitative analysis shows its potential in explainability compared with advanced black-box methods but also limitations on robustness and inaccurate reasoning. Prompt engineering with emotional cues is found to be effective in improving its performance on mental health analysis but requires the proper way of emotion infusion.
Deep Learning for Opinion Mining and Topic Classification of Course Reviews
Authors: Anna Koufakou
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Arxiv link: https://arxiv.org/abs/2304.03394
Pdf link: https://arxiv.org/pdf/2304.03394
Abstract Student opinions for a course are important to educators and administrators, regardless of the type of the course or the institution. Reading and manually analyzing open-ended feedback becomes infeasible for massive volumes of comments at institution level or online forums. In this paper, we collected and pre-processed a large number of course reviews publicly available online. We applied machine learning techniques with the goal to gain insight into student sentiments and topics. Specifically, we utilized current Natural Language Processing (NLP) techniques, such as word embeddings and deep neural networks, and state-of-the-art BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly optimized BERT approach) and XLNet (Generalized Auto-regression Pre-training). We performed extensive experimentation to compare these techniques versus traditional approaches. This comparative study demonstrates how to apply modern machine learning approaches for sentiment polarity extraction and topic-based classification utilizing course feedback. For sentiment polarity, the top model was RoBERTa with 95.5\% accuracy and 84.7\% F1-macro, while for topic classification, an SVM (Support Vector Machine) was the top classifier with 79.8\% accuracy and 80.6\% F1-macro. We also provided an in-depth exploration of the effect of certain hyperparameters on the model performance and discussed our observations. These findings can be used by institutions and course providers as a guide for analyzing their own course feedback using NLP models towards self-evaluation and improvement.
Hierarchical Catalogue Generation for Literature Review: A Benchmark
Authors: Kun Zhu, Xiaocheng Feng, Xiachong Feng, Yingsheng Wu, Bing Qin
Subjects: Computation and Language (cs.CL)
Arxiv link: https://arxiv.org/abs/2304.03512
Pdf link: https://arxiv.org/pdf/2304.03512
Abstract Multi-document scientific summarization can extract and organize important information from an abundant collection of papers, arousing widespread attention recently. However, existing efforts focus on producing lengthy overviews lacking a clear and logical hierarchy. To alleviate this problem, we present an atomic and challenging task named Hierarchical Catalogue Generation for Literature Review (HiCatGLR), which aims to generate a hierarchical catalogue for a review paper given various references. We carefully construct a novel English Hierarchical Catalogues of Literature Reviews Dataset (HiCaD) with 13.8k literature review catalogues and 120k reference papers, where we benchmark diverse experiments via the end-to-end and pipeline methods. To accurately assess the model performance, we design evaluation metrics for similarity to ground truth from semantics and structure. Besides, our extensive analyses verify the high quality of our dataset and the effectiveness of our evaluation metrics. Furthermore, we discuss potential directions for this task to motivate future research.

LuckyyySTA / arxiv-daily

New submissions for Mon, 10 Apr 23 #79

Keyword: text generation

Keyword: machine translation

Keyword: non-autoregressive

Keyword: abstractive summarization

Keyword: factual

Interpretable Unified Language Checking

Keyword: knowledge distillation

Keyword: Hallucination

Keyword: evaluation

ChatGPT-Crawler: Find out if ChatGPT really knows what it's talking about

On the Evaluations of ChatGPT and Emotion-enhanced Prompting for Mental Health Analysis

Deep Learning for Opinion Mining and Topic Classification of Course Reviews

Hierarchical Catalogue Generation for Literature Review: A Benchmark