Despite the growing adoption of mixed reality and interactive AI agents, itremains challenging for these systems to generate high quality 2D/3D scenes inunseen environments. The common practice requires deploying an AI agent tocollect large amounts of data for model training for every new task. Thisprocess is costly, or even impossible, for many domains. In this study, wedevelop an infinite agent that learns to transfer knowledge memory from generalfoundation models (e.g. GPT4, DALLE) to novel domains or scenarios for sceneunderstanding and generation in the physical or virtual world. The heart of ourapproach is an emerging mechanism, dubbed Augmented Reality with KnowledgeInference Interaction (ArK), which leverages knowledge-memory to generatescenes in unseen physical world and virtual reality environments. The knowledgeinteractive emergent ability (Figure 1) is demonstrated as the observationlearns i) micro-action of cross-modality: in multi-modality models to collect alarge amount of relevant knowledge memory data for each interaction task (e.g.,unseen scene understanding) from the physical reality; and ii) macro-behaviorof reality-agnostic: in mix-reality environments to improve interactions thattailor to different characterized roles, target variables, collaborativeinformation, and so on. We validate the effectiveness of ArK on the scenegeneration and editing tasks. We show that our ArK approach, combined withlarge foundation models, significantly improves the quality of generated 2D/3Dscenes, compared to baselines, demonstrating the potential benefit ofincorporating ArK in generative AI for applications such as metaverse andgaming simulation.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)