URL

https://arxiv.org/abs/2402.13446
Affiliations
- Zhen Tan, N/A
- Alimohammad Beigi, N/A
- Song Wang, N/A
- Ruocheng Guo, N/A
- Amrita Bhattacharjee, N/A
- Bohan Jiang, N/A
- Mansooreh Karami, N/A
- Jundong Li, N/A
- Lu Cheng, N/A
- Huan Liu, N/A
  Abstract
- Data annotation is the labeling or tagging of raw data with relevantinformation, essential for improving the efficacy of machine learning models.The process, however, is labor-intensive and expensive. The emergence ofadvanced Large Language Models (LLMs), exemplified by GPT-4, presents anunprecedented opportunity to revolutionize and automate the intricate processof data annotation. While existing surveys have extensively covered LLMarchitecture, training, and general applications, this paper uniquely focuseson their specific utility for data annotation. This survey contributes to threecore aspects: LLM-Based Data Annotation, Assessing LLM-generated Annotations,and Learning with LLM-generated annotations. Furthermore, the paper includes anin-depth taxonomy of methodologies employing LLMs for data annotation, acomprehensive review of learning strategies for models incorporatingLLM-generated annotations, and a detailed discussion on primary challenges andlimitations associated with using LLMs for data annotation. As a key guide,this survey aims to direct researchers and practitioners in exploring thepotential of the latest LLMs for data annotation, fostering future advancementsin this critical domain. We provide a comprehensive papers list at\url{https://github.com/Zhen-Tan-dmml/LLM4Annotation.git}.
  Translation (by gpt-3.5-turbo)
データアノテーションは、機械学習モデルの効果を向上させるために、生データに関連する情報をラベル付けまたはタグ付けする作業です。しかし、このプロセスは労力と費用がかかります。 GPT-4などの先進的な大規模言語モデル（LLMs）の登場により、データアノテーションの複雑なプロセスを革新し自動化する前例のない機会が生まれています。既存の調査はLLMのアーキテクチャ、トレーニング、一般的な応用について広範囲にカバーしていますが、本論文はデータアノテーションにおけるLLMの特定の有用性に焦点を当てています。この調査は、LLMベースのデータアノテーション、LLMによって生成されたアノテーションの評価、およびLLMによって生成されたアノテーションを用いた学習という3つの主要な側面に貢献しています。さらに、本論文には、LLMを用いたデータアノテーションの手法を包括的に分類したタクソノミー、LLMによって生成されたアノテーションを組み込んだモデルの学習戦略の包括的なレビュー、LLMをデータアノテーションに使用する際に関連する主要な課題と制約についての詳細な議論が含まれています。この調査は、最新のLLMをデータアノテーションに活用する研究者や実務家を導き、この重要な領域の将来の進歩を促進することを目的としています。 \url{https://github.com/Zhen-Tan-dmml/LLM4Annotation.git}で包括的な論文リストを提供しています。
Summary (by gpt-3.5-turbo)
GPT-4などの大規模言語モデル（LLMs）を使用したデータアノテーションの研究に焦点を当て、LLMによるアノテーション生成の評価や学習への応用について述べられています。LLMを使用したデータアノテーションの手法や課題について包括的に議論し、将来の研究の進展を促進することを目的としています。

AkihikoWatanabe / paper_notes

Large Language Models for Data Annotation: A Survey, Zhen Tan+, N/A, arXiv'24 #1244

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)