WICG / proposals

A home for well-formed proposed incubations for the web platform. All proposals welcome.
https://wicg.io/
Other
233 stars 16 forks source link

AI Assistants and Web Standards #140

Closed AdamSobieski closed 6 months ago

AdamSobieski commented 9 months ago

Introduction

Today, AI assistants in Web browsers can provide end-users with new and useful features including, but not limited to: (page-content-contextual) natural-language question-answering, summarization, and search.

In the near future, Web Standards could enable broader interoperation between webpages and AI assistants, providing end-users with a larger set of functionalities and capabilities.

In these regards, this proposal outlines some preliminary ideas. I hope to find collaborators interested in these forefront topics at the intersection of AI and Web Standards to discuss, to brainstorm, and to create fuller documents with which to spur innovation and to seek consensus from the community and stakeholders.

Discussion

Assistant Detection

Should webpages be able to detect the existence of or the presence of available AI assistants? If so, should this require user permission?

Assistant Identification

Should webpages be able to obtain descriptive text strings, like user-agent strings, identifying the interoperable AI assistants on a system? If so, should this require user permission?

Assistant Capabilities Inspection

Should webpages be able to inspect the capabilities of detected AI assistants, e.g., which cross-domain and domain-specific "plugins" are installed?

Assistant Attachment and Detachment

Should webpages be able to listen for events when an AI assistant attaches itself to or detaches itself from a webpage? If so, should these capabilities require user permissions?

Should these events' data include user-agent strings and/or the capabilities of the AI assistant attaching or detaching?

Bidirectional Content Sharing

Bidirectional content sharing between webpages and AI assistants can enable new features and functionalities for end-users. Pertinent topics include uses of metadata to ensure that AI-generated content can be readily identified as such.

Relevant standardization-related topics include: Web sharing, clipboarding, and dragging-and-dropping content between webpages and AI assistants.

For an example, a table of data from a webpage could be dragged-and-dropped to an AI assistant, and, after some interaction with the assistant, an AI-generated chart could be dragged-and-dropped back into a document-authoring webpage.

Installing and Uninstalling Domain-specific "Plugins"

Should users or websites be able to install and uninstall one or more domain-specific "plugins" for browser-integrated AI assistants? If so, should this require user permission?

Exporting JavaScript Functions

Should webpages be able to make available and describe JavaScript functions for attached AI assistants to consume and utilize?

Some JavaScript functions might implement functionalities which are common across webpages. Perhaps descriptions of functions could include URI to express that the described functions align with standard functionalities. Standard sets of functionalities could enable new scenarios for end-users, e.g., accessibility scenarios.

Multi-agent Systems

Some websites may provide their own chatbots. In the near future, these could function as agents capable of interacting with end-users and browser-integrated AI assistants operating in view of users on their behalf. Should webpages be able to interface as agents with AI assistants for multi-agent dialogue scenarios?

Conclusion

Thank you. Per the WICG proposal process,

  1. Submit a proposal outlining your idea.
  2. Get feedback and improve your proposal.
  3. Find collaborators and create a GitHub repository.
  4. Work on your proposal and seek consensus from the community.
  5. Advocate for adoption of your proposal to the W3C or the WHATWG for standardization.

I am looking forward to discussing and improving this preliminary proposal with your feedback and to finding interested collaborators to create fuller documents with which to spur innovation and to seek consensus from the community and stakeholders.

marcoscaceres commented 8 months ago

Really appreciate you sparking this discussion on AI and web standards - and, with full disclosure, I am coincidentally writing this using an AI assistant :) It's a hot topic, but diving into it, we've got to be super clear about one thing: history's shown us that messing with assistive tech can go south real fast.

Drawing the Line – Maybe Not: The whole idea might seem like a straightforward way to manage interactions on the surface, but let's not sugarcoat it—it's a minefield. It's akin to setting up a "No Glasses Allowed" sign at the entrance of a museum. You're not just controlling access; you're potentially denying someone the chance to see the art altogether. When we've seen sites try to control how users engage with content (blocking copy-paste, remember?), it didn't just irk people; it sparked backlash because it broke the fundamental web principle of openness and accessibility.

Privacy Isn’t Just a Buzzword: Deep-diving into AI assistant detection feels eerily similar to the dark arts of user tracking. Just like the uproar over browser fingerprinting, outing a user's choice of AI could not only breach their privacy but also signal to them (and the world) that their needs are being monitored and cataloged. That's not just a slippery slope; it's a cliff edge we're talking about.

The Collaboration Dance – Stepping on Toes Isn’t Pretty: Your proposal to sync AI assistants more closely with web content might sound harmonious in theory, but in practice? It's a dance fraught with missteps. We've seen attempts to "improve" accessibility through well-meaning interventions end up hampering the experience for those relying on screen readers. It's not just a matter of getting it wrong; it's about the real harm and exclusion that can happen when we do. This isn't a new tune we're learning—it's an old one we've been trying to get right, and yet, here we are, still stepping on toes.

Plugins and the Pandora’s Box: Venturing into "plugins" for AI assistants, are we ready to relive the toolbar nightmares? Those add-ons were supposed to make life easier but ended up as digital clutter, sometimes veering into security hazards. Innovation? Sure. But let's not forget the cleanup costs post-party.

In Conclusion – Let’s Not Break the Web: In all, while exploring AI's role in the future web is thrilling, let's not do it at the expense of user privacy, autonomy, or accessibility. Remember, the web is their space as much as ours. Implementing any feature that limits how people can interact with it or makes assumptions about their needs is, frankly, a step backward. We've been down this road before, and it's time we learn from those journeys.

Looking forward to hashing this out further. It’s clear we’re all aiming to make the web a better place, but let’s ensure it’s inclusive and respectful of every user’s rights and needs. Thanks for kicking off this essential debate.

marcoscaceres commented 8 months ago

just also noting that robots.txt might already technically cover blocking AI's acting as search engines.

cwilso commented 8 months ago

I wanted to draw the connection between this issue and the recently-published document from the W3C Team at https://w3c.github.io/ai-web-impact/. Paging @dontcallmedom ...

AdamSobieski commented 8 months ago

Page-granular Permissions

Arguing for page-granular permissions and document-metadata solutions, a concrete example is that of a popular banking website.

On a banking website, many pages would have features for interoperability with AI assistants. For instance, Q&A and dialogue-driven workflows could be provided for end-users to be able to learn more about opening new accounts.

However, certain other pages might contain private data, e.g., account balances, ledgers, and transaction histories. Page-granular permissions (e.g., using document-metadata-based) could be useful for preventing browser-based AI assistants from accessing these certain other pages and their contents including when end-users hovered over AI assistant icons, which could open them, and when AI assistants were already opened in a browsing context.

Use Cases and Scenarios

I, for one, am very interested in participating in brainstorming use cases and scenarios. Collaborating to form a lengthy list of use cases and scenarios could be useful for subsequently formulating requirements with respect to any potential APIs.

For an example of such a use-case scenario, end-users might want to be able to ask their AI assistant where in an opened and attached Web document that some described content was. Note that end-users wouldn't need to quote precise keywords or text strings occurring in documents. End-users would be able to ask questions involving natural-language understanding like: "where in this document does it discuss..." or "where in this document is there content about...".

As envisioned, in response, AI assistants would highlight one or more relevant document excerpts. Some means would then be provided for end-users to be able to iterate through these or to scroll to these highlighted excerpts. Applications of this use-case scenario include new forms of interaction with digital textbooks and with scholarly and scientific publications.

Brainstorming, it might become possible to interact thusly not only with respect to single Web documents, but also with respect to tab groups of Web documents.

"Plugins"

It appears that there are two types of "plugins": domain-specific and domain-independent (by "domain", I mean website domain).

For an example of a domain-specific "plugin", a popular hotel chain website might want a "plugin" to be available so that end-users' AI assistants could better perform reservation-related and concierge-related tasks and workflows with them - but only when at their website or, perhaps, only when logged in and staying at their hotel.

For an example of a domain-independent "plugin", a mathematics research organization might want to develop a domain-independent "plugin" so that an AI assistant could answer more mathematical questions more accurately regardless of which websites that the end-users were at.

While "plugins" appear to be a server-side topic, it might be desirable for end-users to be involved in "installing" these "plugins" or otherwise consenting to their activations and connectivity for reasons including so that they would better understand the multiple vendors and multiple components involved in providing them with their desired features at the websites that they make use of.

This could be phrased to end-users in terms of connectivities. "Do you allow [AI assistant name] to connect to [hotel chain plugin name] when you are at [hotel chain website domain]?" With possible answers including: (1) no, (2) not this time, ask again later, (3) yes, just this once, ask again next time, and (4) yes.

Re: Privacy Isn't Just a Buzzword

Those are good points about a need to consider privacy with respect to AI assistant detection, user-agent strings, and capabilities querying.

I was thinking about the analogue of the rapid advancement of Web technologies, developers eager to deliver the latest features for their customers and end-users, multiple vendors, multiple services, potential tiers of services, and related complexities expressed in the form of dynamic compatibility charts (e.g., https://caniuse.com/).

Likewise, looking forward to participating in hashing out these and any other related topics interesting to you or to any other participants.

AdamSobieski commented 7 months ago

Here is an interesting example of an agent-based approach to AI assistants interoperating with Web browsers: https://github.com/EmergenceAI/Agent-E .

Agent-E is an agent based system that aims to automate actions on the user's computer. At the moment it focuses on automation within the browser. The system is based on on AutoGen agent framework.

This provides a natural language way to interacting with a web browser:

  • Fill out forms (web forms not PDF yet) using information about you or from another site
  • Search and sort products on e-commerce sites like Amazon based on various criteria, such as bestsellers or price.
  • Locate specific content and details on websites, from sports scores on ESPN to contact information on university pages.
  • Navigate to and interact with web-based media, including playing YouTube videos and managing playback settings like full-screen and mute.
  • Perform comprehensive web searches to gather information on a wide array of topics, from historical sites to top local restaurants.
  • Manage and automate tasks on project management platforms (like JIRA) by filtering issues, easing the workflow for users.
  • Provide personal shopping assistance, suggesting products based on the user's needs, such as storage options for game cards.

While Agent-E is growing, it is already equipped to handle a versatile range of tasks, but the best task is the one that you come up with. So, take it for a spin and tell us what you were able to do with it. For more information see our blog article.

marcoscaceres commented 7 months ago

However, certain other pages might contain private data, e.g., account balances, ledgers, and transaction histories. Page-granular permissions (e.g., using document-metadata-based) could be useful for preventing browser-based AI assistants from accessing these certain other pages and their contents including when end-users hovered over AI assistant icons, which could open them, and when AI assistants were already opened in a browsing context.

Right, but this takes a backwards view: the data is the user's private data. It's not up to the website to say what the user does with it or preventing them from running AI tools over it.

Again, see what I said about this being akin to blocking a11y tools or sites preventing copy/paste. What is being proposed would be user hostile (or just ignored).

I guess what I'm trying to say is that there is no use case for ever blocking an AI tool from running over a web page. It would be akin to saying there is a use case for blocking a screen reader.

AdamSobieski commented 7 months ago

@marcoscaceres, that makes sense. Perhaps, instead, page metadata could be utilized to describe that a page has a user's private data in it and the AI assistant or screen reader system could, if interoperable with this metadata, decide what to do, e.g., alert a user or request a permission?

<meta name="contains-private-data" content="true" />

Also, any thoughts on bidirectional JavaScript interoperability between pages and AI assistants, e.g., exporting natural-language-described JavaScript functions to AI assistants?

marcoscaceres commented 6 months ago

That would violate the principle of "the data is the metadata". Imagine:

<meta name="contains-private-data" content="true" />
<p>No actual user data.</p>

The way to actually include private data (and to signal that to the AI) is to literally include the private data. That's already covered by HTML's semantic elements, and the content on the page itself. For example:

    <table>
        <caption>Marcos' Account Summary</caption>
        <thead>
            <tr>
                <th>Account Name</th>
                <th>Account Number</th>
                <th>Balance</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Savings Account</td>
                <td>123456789</td>
                <td>$5,000</td>
            </tr>
            <tr>
                <td>Checking Account</td>
                <td>987654321</td>
                <td>$1,250</td>
            </tr>
        </tbody>
    </table>

That already gives the AI a bunch of semantic content. No need for a meta tag.

marcoscaceres commented 6 months ago

@AdamSobieski are you ok with us closing this proposal? Have I convinced you that HTML already supports this?