Browser AI API for Utilizing On-Device Models

kenzic commented 3 weeks ago

Introduction

The Browser AI API is a proposal for a new browser feature that makes AI models accessible directly on users' devices through the browser. By offering a simple API available on the window object, this approach would allow websites to leverage AI without sending data to the cloud, preserving privacy, reducing latency, and enabling offline functionality.

This is about empowering developers to integrate advanced AI into web apps—without needing heavy infrastructure—while giving users control over their data and the models they choose to run. Imagine a world where on-device AI enhances web apps in real-time, with no data leaving the device and no reliance on external servers.

The API would let developers:

Query available AI models on a users' device.
Request user permission to access specific models.
Create sessions with the models.
Perform common tasks like text generation, embeddings, and chat.

By running models directly on the user's hardware, we’re opening up new possibilities for AI-driven web apps while keeping things secure, private, and available offline.

Prototype

I created a prototype of this concept for review: https://github.com/kenzic/browser.ai

API Overview

The proposed API would be exposed on the window.ai object with the following high-level structure:

window.ai = {
  permissions: {
    models: () => Promise<AIModel[]>,
    request: (options: RequestOptions) => Promise<boolean>
  },
  model: {
    info: (options: ModelInfoOptions) => Promise<ModelInfo>,
    connect: (options: ConnectSessionOptions) => Promise<ModelSession>
  }
}

Permissions

Before using any models, websites must first query available models and request permission:

// Get list of available models
const models = await window.ai.permissions.models();

// Request permission for a specific model
const granted = await window.ai.permissions.request({
  model: "llama3.2"
});

Model Sessions

Once permission is granted, websites can create sessions to interact with models:

const session = await window.ai.model.connect({
  model: "llama3.2"  
});

Chat

const session = await window.ai.model.connect({ model: 'llama3.2' });

const response = await session.chat({
  messages: [
    { role: 'user', content: 'Tell me a joke' }
  ],
  options: { temperature: 0.7 }
});

console.log(response.choices[0].message.content);

Embed

const session = await window.ai.model.connect({ model: 'llama3.2' });
const embedding = await session.embed({
  input: 'my text to encode',
});

console.log(embedding.embeddings);

WebIDL

interface Message {
  attribute DOMString role;
  attribute DOMString content;
};

typedef DOMString ModelName;

dictionary ModelDetails {
  required DOMString parent_model;
  required DOMString format;
  required DOMString family;
  required sequence<DOMString> families;
  required DOMString parameter_size;
  required DOMString quantization_level;
};

dictionary ModelInfo {
  required ModelName model;
  required DOMString license;
  required ModelDetails details;
};

dictionary Options {
  double? temperature = null;
  unsigned long? stop = null;
  unsigned long? seed = null;
  double repeat_penalty;
  double presence_penalty;
  double frequency_penalty;
  unsigned long top_k;
  double top_p;
};

dictionary EmbedOptions {
  required DOMString model;
  required (DOMString or sequence<DOMString>) input;
  boolean truncate = false;
  (DOMString or unsigned long)? keep_alive = null;
  Options? options;
};

dictionary EmbedResponse {
  required DOMString model;
  required sequence<sequence<double>> embeddings;
};

dictionary ChatOptions {
  required ModelName model;
  required sequence<Message> messages;
  DOMString? format = null;
  Options? options;
};

dictionary ModelInfoOptions {
  required ModelName model;
};

dictionary ConnectSessionOptions {
  required ModelName model;
};

enum FinishReason {
  "stop",
  "length",
  "tool_calls",
  "content_filter",
  "function_call"
};

dictionary ChatChoice {
  required Message message;
  required FinishReason finish_reason;
};

dictionary ChatResponseUsage {
  required double total_duration;
  required double load_duration;
  required unsigned long prompt_eval_count;
  required double prompt_eval_duration;
  required unsigned long eval_count;
  required double eval_duration;
};

dictionary RequestOptions {
  required ModelName model;
  boolean silent = false;
};

dictionary ChatResponse {
  required DOMString id;
  required sequence<ChatChoice> choices;
  required DOMTimeStamp created;
  required ModelName model;
  required ChatResponseUsage usage;
};

interface ModelSession {
  Promise<ChatResponse> chat(ChatOptions options);
  Promise<EmbedResponse> embed(EmbedOptions options);
};

dictionary AIModel {
  required ModelName model;
  required boolean enabled;
};

interface Permissions {
  Promise<sequence<AIModel>> models();
  Promise<boolean> request(RequestOptions options);
};

interface Model {
  Promise<ModelInfo> info(ModelInfoOptions options);
  Promise<ModelSession> connect(ConnectSessionOptions options);
};

interface AIInterface {
  readonly attribute Permissions permissions;
  readonly attribute Model model;
};

// Expose the AIInterface on the window's ai property
partial interface Window {
  readonly attribute AIInterface ai;
};

Key Benefits

Privacy: User data stays on their device—nothing gets sent to remote servers.
Low Latency: No server round-trips mean faster responses.
Offline Capability: AI apps work even without an internet connection.
Reduced Costs: Developers don’t need expensive infrastructure to serve models.
User Control: Users can decide which models to enable, and they have the power to revoke permissions at any time.

Use Cases

This API opens the door for all kinds of innovative web applications:

AI-powered text editors that assist with writing without sacrificing privacy.
Language translation tools that run locally.
Intelligent form auto-completion to streamline data entry.
Creative tools that help users generate images, music, or video without needing a connection.
Offline chatbots or virtual assistants that don’t depend on cloud services.

Technical Considerations

Model Distribution: Models could be distributed at the OS level or through the browser itself. There are pros and cons to both approaches. OS-level distribution would allow broader access and updates, while browser-based distribution would be easier to roll out without coordination with OS teams.
Security: We need to prevent malicious sites from misusing models, and ensure permission requests are transparent and easy to manage for users.
Performance: Running models in the browser has its challenges, especially on low-power devices. The API should be designed to handle fallback mechanisms, where smaller models can be used if needed.
Cross-Browser Support: This API needs to work consistently across all major browsers.

Other Considerations

There are similar proposals, such as Prompt API proposal

I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.

Feedback

Please provide all feedback below.

tomayac commented 3 weeks ago

(Meta comment: Are you aware of the Prompt API proposal?)

tomayac commented 3 weeks ago

(Also, did you see https://github.com/WICG/proposals/issues/147 and https://github.com/WICG/proposals/issues/163 (both accepted) for more concrete tasks like translation, language detection, summarization, writing, and rewriting?)

kenzic commented 3 weeks ago

Yes, I saw those, and have worked with the prompt api, which is what inspired this proposal. I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.

tomayac commented 3 weeks ago

(Great, was just wondering, since your proposal didn't mention these previous efforts.)

kenzic commented 3 weeks ago

Thanks for the feedback. I updated the description to include "Other Considerations" which discusses this.

WICG / proposals