AkihikoWatanabe commented 5 days ago

URL

https://arxiv.org/abs/2404.01869
Authors
- Philipp Mondorf
- Barbara Plank
  Abstract
- Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models' reasoning behavior. This paper seeks to address this gap by providing a comprehensive review of studies that go beyond task accuracy, offering deeper insights into the models' reasoning processes. Furthermore, we survey prevalent methodologies to evaluate the reasoning behavior of LLMs, emphasizing current trends and efforts towards more nuanced reasoning analyses. Our review suggests that LLMs tend to rely on surface-level patterns and correlations in their training data, rather than on sophisticated reasoning abilities. Additionally, we identify the need for further research that delineates the key differences between human and LLM-based reasoning. Through this survey, we aim to shed light on the complex reasoning processes within LLMs.
  Translation (by gpt-4o-mini)
大規模言語モデル（LLMs）は、最近、推論を伴うタスクにおいて印象的なパフォーマンスを示しており、これにより、これらのモデルが人間と同様の推論能力を持っているかどうかについて活発な議論が展開されています。しかし、これらの成功にもかかわらず、LLMsの推論能力の深さは不確かです。この不確実性は、主にタスクパフォーマンスに対する重点が浅い精度指標によって測定されていることに起因しており、モデルの推論行動の徹底的な調査が行われていないためです。本論文は、このギャップに対処することを目的としており、タスクの精度を超えた研究の包括的なレビューを提供し、モデルの推論プロセスに関するより深い洞察を提供します。さらに、LLMsの推論行動を評価するための一般的な方法論を調査し、現在のトレンドやより微妙な推論分析に向けた取り組みを強調します。我々のレビューは、LLMsがトレーニングデータ内の表面的なパターンや相関関係に依存する傾向があることを示唆しており、洗練された推論能力には依存していないことを明らかにしています。加えて、人間とLLMに基づく推論の重要な違いを明確にするさらなる研究の必要性を特定します。この調査を通じて、LLMs内の複雑な推論プロセスに光を当てることを目指しています。
Summary (by gpt-4o-mini)
LLMsの推論能力に関する研究をレビューし、タスク精度を超えた深い洞察を提供。モデルは表面的なパターンに依存し、洗練された推論能力が不足していることを示唆。人間との推論の違いを明確にするためのさらなる研究が必要であることを指摘。

AkihikoWatanabe commented 5 days ago

論文紹介（sei_shinagawa）:https://www.docswell.com/s/sei_shinagawa/KL1QXL-beyond-accuracy-evaluating-the-behaivior-of-llm-survey

AkihikoWatanabe commented 5 days ago

AkihikoWatanabe / paper_notes

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey, Philipp Mondorf+, arXiv'24 #1484

URL

Authors

Abstract

Translation (by gpt-4o-mini)

Summary (by gpt-4o-mini)