Data annotation is the labeling or tagging of raw data with relevantinformation, essential for improving the efficacy of machine learning models.The process, however, is labor-intensive and expensive. The emergence ofadvanced Large Language Models (LLMs), exemplified by GPT-4, presents anunprecedented opportunity to revolutionize and automate the intricate processof data annotation. While existing surveys have extensively covered LLMarchitecture, training, and general applications, this paper uniquely focuseson their specific utility for data annotation. This survey contributes to threecore aspects: LLM-Based Data Annotation, Assessing LLM-generated Annotations,and Learning with LLM-generated annotations. Furthermore, the paper includes anin-depth taxonomy of methodologies employing LLMs for data annotation, acomprehensive review of learning strategies for models incorporatingLLM-generated annotations, and a detailed discussion on primary challenges andlimitations associated with using LLMs for data annotation. As a key guide,this survey aims to direct researchers and practitioners in exploring thepotential of the latest LLMs for data annotation, fostering future advancementsin this critical domain. We provide a comprehensive papers list at\url{https://github.com/Zhen-Tan-dmml/LLM4Annotation.git}.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)