RobotecAI / o3de-genai-gems

Core features and interfaces enabling use of modern AI with O3DE for variety of purposes.
Other
2 stars 0 forks source link

RFC: AI Core Gem #1

Closed adamdbrw closed 10 months ago

adamdbrw commented 10 months ago

Summary:

With the recent raise of generative AI models such as GPT 4, tools are emerging to plug them into existing workflows and to create new workflows that they enable. This proposal brings forth the new AI Core Gem, which is meant to help O3DE developers to utilize modern AI in games and simulations.

What is the relevance of this feature?

Given the new possibilities coming from recent advances in AI, game and simulation developers are looking to apply these new capabilities in their creations. Many steps that these developers need to take are common regardless of type of application, for example:

The AI Core Gem is quite different from Machine Learning Gem in that it focuses on Generative AI instead on multi-layer perceptrons. Considering dynamic nomenclature in the space and versatility of the Gem though, there are considerations against including "Generative" in the name.

The AI Core Gem is meant for O3DE Gems developers, and it is expected to be a dependency of future gems such as AI characters, assistants, and scene generators. Unlike these future gems, The AI Core Gem value does not strictly depend on current capabilities of AI models, as it is meant as a tool to explore their limits and is meant to be built in a flexible way, benefiting from improvements in these capabilities over years.

In the long run, from the perspective of game development, this feature can help to build smart characters that interact uniquely with the player, and assist in writing dialogue as well as creating 3D world.

From the perspective of simulation, it can help to create robots, humans (in roles such as pedestrians, warehouse workers) and animals that behave in certain ways with less scripting, build smartly randomized simulation scenes and assist users in running validation scenarios as well as summarizing their results.

Feature design description:

Connectivity and communication with Generative AI services

Generative AIs can be used through 3rd-party hosted services, such as Amazon Bedrock or Open AI's GPT platform. These typically involve a pricing model per token, depending on model type and modality. There are proprietary models and open-source models. It is also possible to host models locally, including through tools such as Ollama or vLLM.

AI services increasingly offer additional modalities (such as image prompting) as well as complex services, such as Assistants.

Since the pace of development and emerging of new APIs is rapid, it is important for the AI Core Gem to be flexible and extendable in its implementation of connectivity and communication. As such, the approach is to be:

The communication layer will be abstracted, allowing to support local-GPU run models in the future, as well as streaming connections such as websockets. With the first release, the feature set will rely on HttpRequestor gem for communication.

Global settings

While it makes sense to picture a use-case where more than one vendor's AI is used within one project, as they can easily have different strengths, as a first step it is good to start simple and have one global setting for the AI features, much like Physics Settings.

These global settings will include URI and other connectivity settings such as for authorization, usage limits, default models for each modality, and user preference settings for things like visualization.

The first version will only include URI and default model selection.

Global settings will be accessible through Editor menu and through registry key settings.

AI calling O3DE interfaces

Core value of this gem is to allow AI to perform work through O3DE interfaces. Examples include:

These APIs will be exposed through a kind of reflection mechanism, but full documentation needs to be also shared, either as URIs or initial prompt.

O3DE sharing data with AI

To interact with O3DE in an informed way, AI will require inputs such as:

Runtime interaction with characters and environment might be specific to application, for example simulations are likely to expose ROS interfaces. For the first implementation, sharing list of assets, generic text prompts and callable methods is enough.

Future extensions

The gem can be extended with a voice interface, allowing to prompt and give tasks to AI by simply speaking. Note that text-to-speech is a part of some vendor APIs already, so that AI can speak back.

RFCs for new AI feature gems will follow when this gem is implemented.

Technical design description:

Challenges of AI RPC interface

To rely O3DE APIs to AI service, we need to supply information of functions' signatures, document their purpose, semantics, parameters and return values. The immediate issue is that the documentation is typically provided as code comments and itself unavailable at runtime. Such documentation is also not provided through the current behavior reflection system.

Possible solutions include:

There are various drawbacks to each of these solutions, such as ensuring good workflow when custom gems are involved, including proprietary ones, exposing code base headers to 3rd party (potentially licensing issues), blast radius of changes in O3DE, avoiding noise for AI such as APIs which are not accessible, not whitelisted or irrelevant, the issue of essentially copying information with custom documentation approach. There is also an issue of being able to register interfaces and assign callbacks dynamically. Another consideration is that AI might benefit from a custom, iterative approach to return values and how much feedback it needs to function at optimal performance, which can differ significantly from how current APIs are constructed.

One considered approach is to expand on behavior reflection system in O3DE, either through providing a way to generate AI-suitable reflection, at least for selected category, or by creating another layer in the reflection system.

The other approach considered approach is a custom API registration system which supplies function name and signature, its documentation and callback.

Community comments on RPC design are especially welcome.

AI -> O3DE interface

The API registration mechanism needs to be a part of AI Core Gem developer's interface, so that custom gems and their components can register new ways of interaction.

The AI Core Gem RPC System Component will rely static RPC (constructed through Reflection mechanism) to the AI service, and allow for dynamic attachment of callbacks to existing registry entries (otherwise callbacks are considered empty, which should cause warnings). It will also allow to dynamically add API entries (including the callback). The method of relying such API description to the AI service may be implementation-dependent, by default using text prompts, but some implementations might instead produce a file with static API description and upload it to AI Assistant-like service.

The AI service will be instructed (by internally-captured, configurable prompts) to call APIs within a specified text block, making parsing of its response straightforward. Most likely, JSON format will be used to structure the API calls in text, which is a common approach in different contexts, see libjson-rpc-cpp library as an example.

O3DE -> AI interface

Providing data to AI service will be implementation dependent. AI Core Gem will include text prompting, but might include image prompting once available in popular open source models. Other modalities might be included if standardized and implemented by most of vendors. Until then, modalities other than text will be left to vendor-specific gems.

In the context of robotics, modalities other than text will be especially important, for example images from robot camera sensors, or audio commands from its human co-workers.

What are the advantages of the feature?

Once this gem is released, O3DE developers will be empowered to develop AI-based features based on this gem. This will bring users looking to explore and develop AI applications for academic or industrial use-cases to O3DE.

What are the disadvantages of the feature?

Given that the AI space is extremely dynamic, this gem needs to be supported and updated continuously. It needs to stay relevant as the space expands and AI-empowered tools become commonplace.

There is also a considerable effort to decide which interfaces to expose and understand what is possible with the technology.

Are there any alternatives to this feature?

The main alternative is treat AI as set of external tools and to focus on developing rich API for O3DE to interact with these, as opposed to integrated approach that this proposal describes.

While integrated approach involves writing some extra wrapper code and developing O3DE-side UI/UX, the advantages lie in tailored approach to collaborative content creation and ability to work better with Editor workflows. These are main reason for preference for the integrated approach.

Another alternative is not to have the AI Core Gem, but instead one gem per vendor, including open source. However, this has disadvantages of repeating the common part and it doesn't help to have the same UX for AI users in O3DE, where a common use-case will be to compare performance of AI from several vendors.

How will users learn this feature?

The Gem will be a part of canonical set, documented and cross-referenced in O3DE documentation. Publicity for the Gem is also planned, and showcase demo will be released in 2024. The gem will likely be presented alongside other AI gem(s), as it focused on core functionalities rather than user-facing features.

Are there any open questions?

arturkamieniecki commented 10 months ago

How to best protect against incurring of unreasonable costs possible due to mistakes made by developer in automated jobs that use of AI (such as infinite loops)?

Should we use existing Behavior Reflection mechanism, which will require changes, or a more tailored approach?

  • I think that if a more tailored approach would be used this would require two approaches.
    1. If a component is intended to be used with AI a reflection should be possible to be added in the source code of the component.
    2. Other functions (for example the Editor API for placing and moving objects) naturally will not have these reflections out of the box and a method for reflecting "external" functions would be needed.

How to best serve the documentation of API to avoid repetitions (with Doxygen comments) as well as respect versioning and closed source Gems?

  • A custom reflection would allow for the user to specify the "description" of the reflected function. This could be a string freely describing the parameters, return value, and function behavior, or a more restricted restricted version where each parameter needs to be described separately. These "descriptions" would be then provided to the AI.