URL

https://arxiv.org/abs/2305.00970
Affiliations
- Qiuyuan Huang, N/A
- Jae Sung Park, N/A
- Abhinav Gupta, N/A
- Paul Bennett, N/A
- Ran Gong, N/A
- Subhojit Som, N/A
- Baolin Peng, N/A
- Owais Khan Mohammed, N/A
- Chris Pal, N/A
- Yejin Choi, N/A
- Jianfeng Gao, N/A
  Abstract
- Despite the growing adoption of mixed reality and interactive AI agents, itremains challenging for these systems to generate high quality 2D/3D scenes inunseen environments. The common practice requires deploying an AI agent tocollect large amounts of data for model training for every new task. Thisprocess is costly, or even impossible, for many domains. In this study, wedevelop an infinite agent that learns to transfer knowledge memory from generalfoundation models (e.g. GPT4, DALLE) to novel domains or scenarios for sceneunderstanding and generation in the physical or virtual world. The heart of ourapproach is an emerging mechanism, dubbed Augmented Reality with KnowledgeInference Interaction (ArK), which leverages knowledge-memory to generatescenes in unseen physical world and virtual reality environments. The knowledgeinteractive emergent ability (Figure 1) is demonstrated as the observationlearns i) micro-action of cross-modality: in multi-modality models to collect alarge amount of relevant knowledge memory data for each interaction task (e.g.,unseen scene understanding) from the physical reality; and ii) macro-behaviorof reality-agnostic: in mix-reality environments to improve interactions thattailor to different characterized roles, target variables, collaborativeinformation, and so on. We validate the effectiveness of ArK on the scenegeneration and editing tasks. We show that our ArK approach, combined withlarge foundation models, significantly improves the quality of generated 2D/3Dscenes, compared to baselines, demonstrating the potential benefit ofincorporating ArK in generative AI for applications such as metaverse andgaming simulation.
  Translation (by gpt-3.5-turbo)
混合現実やインタラクティブAIエージェントの採用が増えているにもかかわらず、これらのシステムが未知の環境で高品質な2D/3Dシーンを生成することは依然として課題です。一般的な方法は、新しいタスクごとにAIエージェントを展開して大量のデータを収集し、モデルトレーニングを行うことが必要です。しかし、このプロセスは多くのドメインではコストがかかり、または不可能な場合があります。本研究では、一般的な基礎モデル（例：GPT4、DALLE）から知識メモリを転送して、物理的または仮想世界でのシーン理解と生成のための新しいドメインやシナリオに対応する無限エージェントを開発しました。アプローチの中心には、知識推論インタラクションを拡張現実と呼ばれる新しいメカニズム（ArK）があり、知識メモリを活用して未知の物理世界や仮想現実環境でシーンを生成します。知識インタラクティブな新興能力（図1）は、i）クロスモダリティのマイクロアクション：物理的現実から各インタラクションタスク（例：未知のシーン理解）のための関連する知識メモリデータを収集するためのマルチモダリティモデルのマイクロアクション、およびii）現実に依存しないマクロビヘイビア：異なる特徴付けされた役割、ターゲット変数、協調情報などに合わせて相互作用を改善するミックスリアリティ環境のマクロビヘイビアを学習します。ArKの有効性をシーン生成と編集のタスクで検証しました。大規模な基礎モデルと組み合わせたArKアプローチは、ベースラインと比較して生成された2D/3Dシーンの品質を大幅に向上させ、メタバースやゲームシミュレーションなどの応用においてArKを生成AIに組み込むことの潜在的な利益を示しています。
Summary (by gpt-3.5-turbo)
本研究では、混合現実やインタラクティブAIエージェントのシステムが未知の環境で高品質な2D/3Dシーンを生成することが課題であることを指摘し、一般的な基礎モデルから知識メモリを転送して、物理的または仮想世界でのシーン理解と生成のための新しいドメインやシナリオに対応する無限エージェントを開発した。このアプローチには、知識推論インタラクションを拡張現実と呼ばれる新しいメカニズムがあり、知識メモリを活用して未知の物理世界や仮想現実環境でシーンを生成する。このアプローチは、生成された2D/3Dシーンの品質を大幅に向上させ、メタバースやゲームシミュレーションなどの応用において有用であることが示された。

AkihikoWatanabe / paper_notes

ArK: Augmented Reality with Knowledge Interactive Emergent Ability, Qiuyuan Huang+, N/A, arXiv'23 #633

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)