explainers-by-googlers / prompt-api

A proposal for a web API for prompting browser-provided language models
Creative Commons Attribution 4.0 International
256 stars 18 forks source link

Choose model #8

Open niutech opened 4 months ago

niutech commented 4 months ago

There could be more than one LLM in a web browser (built-in or added as a web extension). Let's show users the list of available LLMs (using their IDs) and allow them to optionally choose a model when creating a session.

For example:

const models = await ai.listModels(); // ['gemini-nano', 'phi-3-mini']
const session = await ai.createTextSession({
  model: models[1] // 'phi-3-mini'
});
const modelInfo = await ai.textModelInfo(models[1]); // {id: 'phi-3-mini', version: '3.0', defaultTemperature: 0.5, defaultTopK: 3, maxTopK: 10}
captainbrosset commented 4 months ago

Let's show users the list of available LLMs

I guess you meant developer here, not user, right?

niutech commented 4 months ago

Users of the API, i.e. developers.

christianliebel commented 4 months ago

I assume this could be problematic as it would create a fingerprinting vector, compromising user privacy. Additionally, this approach might lack forward compatibility, as models are likely to evolve and change over time. A more robust solution could be to expose metadata about each model, such as context window size, number of parameters, supported languages, and relevant capabilities (translation, etc.). This way, developers can make informed decisions based on the features and performance characteristics they need without directly exposing model IDs.

niutech commented 4 months ago

@christianliebel How would exposing model ID be more problematic in terms of fingerprinting when there is already user-agent name and version available, as well as textModelInfo, which easily deduces which LLM is built-in (Google Chrome -> Gemini Nano). I'm proposing to return even more detailed model metadata:

const modelInfo = await ai.textModelInfo('gemini-nano'); // {id: 'gemini-nano', version: '1.0', defaultTemperature: 0.8, defaultTopK: 3, maxTopK: 10}

This would allow web developers to choose the best fitting local model depending on use-case (e.g. math, reasoning, poetry). Also there should be a way to add custom models as web extensions (#11).

christianliebel commented 4 months ago

The composition of models (especially when you register custom ones) could be pretty unique, similar to fonts.

niutech commented 4 months ago

@christianliebel It's the same level of uniqueness as when detecting which web extensions are installed like extension-detector. Even ad blockers can be detected. I think the possibility to choose among multiple local LLMs justifies slightly bigger fingerprinting surface. If you care about privacy, you just won't install any additional LLMs.

ToonTalk commented 4 months ago

I was going to create an issue about choosing versions but I see the suggestion that version be part of the textModelInfo. I imagine that developers may want to give the user the choice of proceeding with their currently downloaded model or to download a newer version. And it would be nice if somehow the user can make an informed decision regarding how big a download and how significant the upgrade is.

mmccool commented 2 months ago

I've proposed a breakout session to discuss some of the privacy tradeoffs for AI model management at W3C TPAC in two weeks, see https://github.com/w3c/tpac2024-breakouts/issues/15. Indeed, the set of models available (if shared cross-site, and if they can be updated independent of the browser version) does create some fingerprinting risks. On the other hand, if they are tied to APIs that are updated and packaged along with the browser version, so that knowledge of what models are available can be predicted exactly from the browser version (which as has been pointed out above, is known) then that knowledge adds no new fingerprinting information. What I am interested in is the middle ground, with potential solutions like shared foundation models (distributed with the browser and tied to the browser version, so no new information) + same-origin-cached adapters (which can be small). But that is one of several different options with different tradeoffs, and there are bunch of missing bits in the specifications right now.