explainers-by-googlers / prompt-api

A proposal for a web API for prompting browser-provided language models
Creative Commons Attribution 4.0 International
193 stars 12 forks source link

Clarification on supportsLanguage API #29

Open sushraja-msft opened 1 month ago

sushraja-msft commented 1 month ago

Topic 1: This API feels like would suffer from conflation of intent

It might be better to have this API be a pure feature detection API

boolean supportsLanguage(languageTag)

this way the UA is free to apply heuristics to determine if a language has been requested enough number of times to trigger download of a specific model.

Topic 2: It is going to be challenging for interop if we cannot quantify what support means. We would need to think of test suites that can help validate level of support if a UA claims supportsLanguage is true. Any thoughts on how to manage this?

domenic commented 1 month ago
  • Or is it for developers to trigger the download of another model for language support?

Can you explain how you got the impression that an API with the name "supports" would trigger a download? The explainer tried to be pretty clear that only create() can cause downloads, and the entire point of the capabilities API is to probe capabilities without causing downloads.

sushraja-msft commented 1 month ago

Oh it's because of this line:

supportsLanguage(languageTag), which returns "no", "after-download", or "readily" to indicate whether the model supports conversing in a given human language.

domenic commented 1 month ago

Right, and right above

"after-download", indicating the device or browser supports prompting a language model, but it needs to be downloaded before it can be used.

I guess we can clarify with a cross-link to explain how downloads happen, i.e. via create().

Topic 2: It is going to be challenging for interop if we cannot quantify what support means. We would need to think of test suites that can help validate level of support if a UA claims supportsLanguage is true. Any thoughts on how to manage this?

In our experience model developers are relatively clear on what languages their models have been trained on, and officially "support". E.g. it is usually listed in the model card. That was our initial thinking, but maybe there's more subtlety here.

sushraja-msft commented 1 month ago

I guess we can clarify with a cross-link to explain how downloads happen, i.e. via create().

It is still odd that supportsLanguage returns AICapabilityAvailability. Simple boolean feels appropriate.

In our experience model developers are relatively clear on what languages their models have been trained on, and officially "support". E.g. it is usually listed in the model card. That was our initial thinking, but maybe there's more subtlety here.

The model card lists what was present before quantization and what the model was trained on, is there a language benchmark that should be used to validate final support - if web developers are going to rely on the language capability I am wondering if the spec should mandate a minimum score on a benchmark, otherwise developers would need to know the model or UA.

domenic commented 1 month ago

It is still odd that supportsLanguage returns AICapabilityAvailability. Simple boolean feels appropriate.

I don't understand why it's odd. The idea is to signal whether the language is not available, is available but doing so might require an expensive download (e.g. of a LoRA or other fine-tuning), or is readily available. Developers might make different decisions based on those three possibilities.

How would you signal those three possibilities with just a boolean?

domenic commented 1 month ago

otherwise developers would need to know the model or UA.

The idea of the API is that they wouldn't need to know the model or UA. They'd only need to know the return value of the API. In other words, they can rely on language X being present if supportsLanguage("x") returns "after-download" or "readily".

sushraja-msft commented 1 month ago

Thanks for the explanation about LORAs per language.

When a developer checks language support for multiple languages with supportsLanguage and receives 'after-download', is the expectation that the developer should proceed with create and then check supportsLanguage again to see which of the languages he was interested in are now ready?

Regarding my question about benchmarks for language support, I see that the position is "We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers... These are left as quality-of-implementation issues". Which means that a developer would need to check a UA string to see if he trusts a given browser's model, which is problematic, but I understand the position.

domenic commented 1 month ago

When a developer checks language support for multiple languages with supportsLanguage and receives 'after-download', is the expectation that the developer should proceed with create and then check supportsLanguage again to see which of the languages he was interested in are now ready?

The intention is that for any capability, if you get "after-download", you should not have to check again. "after-download" means that once you create a model with options that use that capability, the capability will be downloaded, and will work. So there's no need to check the capabilities again; you know it will work.

For example:

const capabilities = await ai.assistant.capabilities();

if (capabilities.supportsLanguage("ja") === "after-download") {
  const session = await ai.assistant.create({
    systemPrompt: "あなたは優しいくまです。"
  });

  // Now chat with the friendly bear. No need to check `capabilities` again. It'll definitely work.
}

In practice this means code will look more like:

const capabilities = await ai.assistant.capabilities();

if (capabilities.supportsLanguage("ja") !== "no") {
  // This might download or might be created readily. If we care we can check in more detail,
  // but many pieces of code might not care.
  const session = await ai.assistant.create({
    systemPrompt: "あなたは優しいくまです。"
  });

  // As before.
}

However, I think you've highlighted a weakness with the API. Unlike with the various options for the writing assistance APIs (e.g. tone, length, etc.), there's not a very direct way to provide the expected input language as an option at creation time. In my above example I've used the system prompt, but of course it might not be convenient to do that. And it's a bit magical for the web browser to be running language detection on the system prompt/initial prompts and then using that to decide which LoRAs to download.

So, probably we should add a more explicit inputLanguages or expectedInputLanguages option or something, which would provide such a signal.

Regarding my question about benchmarks for language support, I see that the position is "We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers... These are left as quality-of-implementation issues". Which means that a developer would need to check a UA string to see if he trusts a given browser's model, which is problematic, but I understand the position.

This is not what I was trying to communicate. Let me try a different phrasing.

Let's say a browser wants to bundle model X with their browser. Model X's system card says that it's been trained on English, Japanese, and French. The browser company is happy with their models performance on these languages and is happy for people to use it on those languages.

So, the implementation of the prompt API in that browser, which uses model X, will return "readily" (or maybe "after-download") for those three languages. It will return "no" for the others. The browser has no incentive to return a positive value for other languages. It has not tested those other languages. It did not train on them. Doing so would just lead to web developers getting wrong information about the model from the capabilities API.

From the web developer side, there is no need for user agent sniffing. If they ask capabilities.supportsLanguage("ja"), they get a correct answer to the question: does the browser company believe their model is capable of taking in Japanese input. Since there is no incentive for browser companies to lie, this is all they need for 99% of cases. If they are a very specialized site that wonders if the model understands a certain high level of Japanese, then they will likely need to run their own benchmarks by creating a benchmarking session and testing it on various Japanese grammar and vocabulary questions. But there's no incentive to use UA sniffing. In particular, UA sniffing is likely to work poorly, given how models might update out of band from browser versions, or vary by factors like device video memory that are not exposed through UA sniffing APIs.

Really, this capabilities API is just like any other capabilities API on the web. Web developers prefer using RTCRtpReceiver.getCapabilities() over UA sniffing, because it's more useful and browser vendors accurately convey their capabilities in it.