WICG / proposals

A home for well-formed proposed incubations for the web platform. All proposals welcome.
https://wicg.io/
Other
233 stars 16 forks source link

Advancing Web Metadata #141

Open AdamSobieski opened 8 months ago

AdamSobieski commented 8 months ago

Introduction

Ideas are presented below towards enabling Web developers to be able to use CSS selectors to select elements based upon their metadata. Metadata-related scenarios thus far considered are:

  1. Document Metadata - metadata included in the <head> sections of documents using <meta> and <link> elements,
  2. Attribute-based Metadata - metadata stored in or referenced by elements' attributes, e.g. msrc attributes,
  3. Media Resource Metadata - metadata embedded in media resources (images, audio, and video),
  4. RDFa Metadata - metadata attached to documents using RDFa.

JavaScript

Here is a WebIDL sketch describing a new method on Element, getMetadata(), for obtaining metadata:

partial interface Element
{
    Graph getMetadata(optional object args);
}

Here are JavaScript examples for the metadata-related scenarios thus far considered:

var graph1 = document.documentElement.getMetadata({ kind: 'document' });
var graph2 = document.getElementById('math123').getMetadata({ kind: 'attribute' });
var graph3 = document.getElementById('img123').getMetadata({ kind: 'c2pa' });
var graph4 = document.getElementById('span123').getMetadata({ kind: 'rdfa' });

Accessibility

While ARIA in HTML data are extracted into accessibility trees, trees of accessibility objects that assistive technology can query for attributes and properties and perform actions upon, should extraction of these data into accessibility graphs be considered in the future, the following could be of use:

var graph5 = document.getElementById('widget123').getMetadata({ kind: 'aria' });

Cascading Style Sheets

With respect to selecting graph-based metadata with CSS selectors, syntax is presented which introduces a new pseudo-class, :meta(,,), for semantic triples or statements.

With such a new pseudo-class, the following logical operations can be expressed:

:meta(s, p, o)
:meta(s1, p1, o1):meta(s2, p2, o2)
:is(:meta(s1, p1, o1), :meta(s2, p2, o2))
:not(:meta(s, p, o))

1. Document Metadata

With respect to document-based metadata, utilizing <meta> and <link> elements in documents' <head> sections, the metadata could be considered as being attached to the html element.

Here is an example document:

<html prefix="og: https://ogp.me/ns#">
  <head>
    <meta property="og:type" content="image" />
    <meta property="og:image:type" content="image/png" />
    ...
  </head>
  <body>
    <article>
      <section class="description">
        <h1>...</h1>
        ...
      </section>
    </article>
  </body>
</html>

Here is an illustration of metadata-based selection:

@namespace og url(https://ogp.me/ns#)

html:meta(*, og|type, 'image') section.description > h1 { color: blue; }

2. Attribute-based Metadata

The following examples show attribute-based metadata using a new attribute: msrc.

The following example shows a usage of an msrc attribute with absolute URL:

<p id="p123" msrc="https://www.example.com/m/p123metadata.n3">...</p>

The following example shows a usage of an msrc attribute with a relative URL:

<p id="p123" msrc="p123metadata.n3">...</p>

The following example shows a usage of an msrc attribute with a local document element reference:

<html>
  <head>
    <script id="ref" type="text/n3">...</script>
  </head>
  <body>
    <p id="p123" msrc="#ref">...</p>
  </body>
</html>

The following example shows a usage of an msrc attribute with a data URL:

<p id="p123" msrc="data:text/n3;base64,QHByZWZpeCBkYzExOiA8aHR0cDovL3B1cmwub3JnL2RjL2VsZW1lbnRzLzEuMS8+IC4NCg0KPGh0dHA6Ly93d3cuZXhhbXBsZS5jb20jcDEyMz4gZGMxMTpjb250cmlidXRvciAiQWxpY2UiIC4=">...</p>

which encodes the following N3 content:

@prefix dc11: <http://purl.org/dc/elements/1.1/> .

<http://www.example.com#p123> dc11:contributor "Alice" .

Here is a corresponding metadata-based selector:

@namespace dc11 url(http://purl.org/dc/elements/1.1/)

p:meta(*, dc11|contributor, 'Alice') { color: blue; }

Beyond the use of a wildcard or universal selector match, *, Web developers might want for a way to specify that the ids or URIs of the elements being considered for selection are, more precisely, to be matched, e.g., <http://www.example.com#p123>.

This could resemble something like:

@namespace dc11 url(http://purl.org/dc/elements/1.1/)

p:meta(this, dc11|contributor, 'Alice') { color: blue; }

Sharing, Clipboarding, and Dragging-and-Dropping

For media resources (images, audio, and video), metadata accompanies content through sharing, clipboarding, and dragging-and-dropping operations. Similarly, XML-based document element metadata should accompany content. This suggests that, at least for the option where references to local document elements are used (msrc="#ref"), referenced metadata could be snapshotted into a data URL (msrc="data:...") when content is shared, clipboarded, or dragged-and-dropped.

3. Media Resource Metadata

The following examples utilize images' C2PA metadata to select and style an image based on whether they are AI-generated. As C2PA manifests utilize JSON, the following examples make use of a transformation from JSON to RDF.

For the following examples, the following pertinent content is utilized from a C2PA JSON manifest:

{
  "actions": [
    {
      "action": "c2pa.created",
      "digitalSourceType": "trainedAlgorithmicMedia"
    }
  ]
}

The following semantic graph is obtained from a transformation:

/* triples as < subject, predicate, object > */

<root, actions, 0>
<0, action, c2pa.created>
<0, digitalSourceType, trainedAlgorithmicMedia>

The following CSS selects images that are AI-generated according to C2PA metadata:

img:meta(*, actions, bind('x')):meta(bind('x'), action, 'c2pa.created'):meta(bind('x'), digitalSourceType, 'trainedAlgorithmicMedia')
{ border-color: blue; }

4. RDFa Metadata

Ideas are broached, below, regarding interoperability between theorized metadata-based selectors and HTML+RDFa.

For the following examples, here is an HTML5+RDFa1.1 document:

<html prefix="dc11: http://purl.org/dc/elements/1.1/" lang="en">
  <head>
    <title>John's Home Page</title>
    <link rel="profile" href="http://www.w3.org/1999/xhtml/vocab" />
    <base href="http://example.org/john-d/" />
    <meta property="dc11:creator" content="Jonathan Doe" />
    <link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
  </head>
  <body about="http://example.org/john-d/#me">
    <h1>John's Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
        lang="de">Einstürzende Neubauten</a>.
    </p>
    <p>
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span id="span123" about="urn:ISBN:0752820907">
      <cite property="dc11:title">Weaving the Web</cite> by
      <span property="dc11:creator">Tim Berners-Lee</span></span></span>.
    </p>
  </body>
</html>

which contains the following content expressed in Turtle:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.org/john-d/>
   <http://purl.org/dc/elements/1.1/creator> "Jonathan Doe"@en;
   foaf:primaryTopic <http://example.org/john-d/#me> .
<http://example.org/john-d/#me>
   foaf:nick "John D"@en;
   foaf:interest <http://www.neubauten.org/>;
   foaf:interest <urn:ISBN:0752820907> .
<urn:ISBN:0752820907>
   <http://purl.org/dc/elements/1.1/title> "Weaving the Web"@en;
   <http://purl.org/dc/elements/1.1/creator> "Tim Berners-Lee"@en .

One approach to selecting document elements based on interwoven RDFa metadata involves using the presence of about attributes to connect or to bridge document elements and semantic graphs.

As envisioned, something like the following could style the text color of the span with id span123 to blue:

@namespace dc11 url(http://purl.org/dc/elements/1.1/)

span:meta(uri('urn:ISBN:0752820907'), dc11|title, literal('Weaving the Web', 'en')) { color: blue; }

SPARQL Comparison

Here are presented some SPARQL ASK queries with corresponding metadata-based selector examples:

PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
ASK
WHERE
{
  ?person  a          foaf:Person ;
           foaf:name  ?name ;
           foaf:mbox  ?email
}
@namespace rdf url(http://www.w3.org/1999/02/22-rdf-syntax-ns#)
@namespace foaf url(http://xmlns.com/foaf/0.1/)

element:meta(bind('person'), rdf|type, foaf|Person):meta(bind('person'), foaf|name, bind('name')):meta(bind('person'), foaf|mbox, bind('email'))
{ color: blue; }

Conclusion

Thank you. Per the WICG proposal process,

  1. Submit a proposal outlining your idea.
  2. Get feedback and improve your proposal.
  3. Find collaborators and create a GitHub repository.
  4. Work on your proposal and seek consensus from the community.
  5. Advocate for adoption of your proposal to the W3C or the WHATWG for standardization.

I am looking forward to discussing and improving this preliminary proposal with your feedback and to finding interested collaborators to create fuller documents with which to spur innovation and to seek consensus from the community and stakeholders.

Crissov commented 8 months ago

The following examples utilize images' C2PA metadata to select and style an image based on whether they are AI-generated. As C2PA manifests utilize JSON, the following examples make use of implicit transformations from JSON to XML.

Did you just casually postulate a way how Selectors apply to JSON documents? It seems there would have to be a specification for that, independent of the rest of your suggestion.

It would have helped to understand your examples quicker and better if you had provided the respective JSON (or XML) metadata sample.

Note that both embedded and linked metadata comes in a variety of data structures. There‘s tree-like JSON, RDF, XML etc., but also flat key–value lists for instance. Sometimes one is nested within the other.

Your proposed pseudo-class :meta() would not match with the element it is attached to, but with its external replaced content, which therefore needs to be loaded and parsed successfully. This might pose a problem (or more than one). A proper home for this could be Non-Element Selectors.

However, I think the overall use case is sound. It just should start simpler. For some scenarios, existing features may even get you there already, or could with minor alterations.

img:has(::shadow svg > metadata foo|bar) {
  border: thick solid green;
}

It may be reasonable to specify an abstraction layer for the properties of external resources used in replaced elements.

img::replaced[:name=foo] {
  border: thick solid green;
}
AdamSobieski commented 8 months ago

@Crissov, I found some JSON to XML services online (e.g., https://www.convertjson.com/json-to-xml.htm) and I see your points about standardization with respect to such mappings and transformations.

With respect to a JSON C2PA manifest example, I found an example online resembling:

{
  "actions": [
    {
      "action": "c2pa.created",
      "when": "2023-02-11T09:00:00Z",
      "softwareAgent" : {
          "name": "Joe's Photo Editor",
          "version": "2.0",
          "schema.org.SoftwareApplication.operatingSystem": "Windows 10"
      },
      "digitalSourceType": "http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia",
      "parameters" : {
        "ingredients" : [
          {
            "url": "self#jumbf=c2pa/joe-ed:urn:uuid:ABCD/c2pa.assertions/c2pa.ingredient__1",
            "alg": "sha256",
            "hash" : "...",
          },
          {
            "url": "self#jumbf=c2pa/joe-ed:urn:uuid:EFGH/c2pa.assertions/c2pa.ingredient__2",
            "alg": "sha256",
            "hash" : "...",
          }
        ]
      }
    }
  ]
}

and then I read here that the value of digitalSourceType would, as described, instead, be trainedAlgorithmicMedia.

{
  "actions": [
    {
      "action": "c2pa.created",
      "when": "2023-02-11T09:00:00Z",
      "softwareAgent" : {
          "name": "Joe's Photo Editor",
          "version": "2.0",
          "schema.org.SoftwareApplication.operatingSystem": "Windows 10"
      },
      "digitalSourceType": "trainedAlgorithmicMedia",
      "parameters" : {
        "ingredients" : [
          {
            "url": "self#jumbf=c2pa/joe-ed:urn:uuid:ABCD/c2pa.assertions/c2pa.ingredient__1",
            "alg": "sha256",
            "hash" : "...",
          },
          {
            "url": "self#jumbf=c2pa/joe-ed:urn:uuid:EFGH/c2pa.assertions/c2pa.ingredient__2",
            "alg": "sha256",
            "hash" : "...",
          }
        ]
      }
    }
  ]
}

I'm still reading the latest version of C2PA specification, in these regards. Per your feedback, I created a clarifying C2PA JSON manifest example, showing the relevant content and structure, and edited it into the proposal. I also updated the proposal with JSON to RDF (https://www.w3.org/2016/01/json2rdf.html) and examples.

marcoscaceres commented 8 months ago

I'm a bit confused... shouldn't the metadata be inside the thing being shared/used/accessed/rendered? It's only in exceptional cases (e.g., text strings, URL, etc.) that metadata would be external.

Additionally, I'm wondering how this relates to https://ogp.me ? Open graph does a lot of this already, no? (apart from the metadata-based selection, but that seems to be somewhat challenging for the reasons mentioned about).

AdamSobieski commented 8 months ago

@marcoscaceres, to your question about how these ideas relate to OpenGraph or Schema.org, in addition to those vocabularies being useful for document metadata in <link> and <meta> elements in HTML5 documents' <head> portions, one could use RDFa or microdata to place metadata, e.g., using those vocabularies, inside of document elements.

With the data URL option for an msrc attribute, shown below, arbitrary metadata, including using those vocabularies, could be attached to document elements, effectively being inside them, accompanying them through operations such as sharing, clipboarding, and dragging-and-dropping.

<p msrc="data:application/rdf+xml,...">...</p>

Also, with respect to securing portable document elements and their metadata, possibilities include, but are not limited to, providing hashes of the described elements' outerHTML (without the msrc attribute).

<p msrc="data:application/rdf+xml,... includes a hash of outerHTML minus this attribute? ...">...</p>

Use cases for external metadata include those pertaining to mathematical markup and mathematical knowledge management. With the metadata external, it could be revised independently of any published HTML or EPUB documents containing the content.

<math id="expr123" msrc="https://www.example.com/proof.php?item=expr123">...</math>

Similarly with argumentation for natural-language claims.

<span id="claim123" msrc="https://www.example.com/argumentation.php?item=claim123">...</span>

With respect to referenced external metadata, theorized metadata-based selection would be challenging for reasons including: (1) it would involve loading, parsing, and verifying external resources, checking hashes and digital signatures, and (2) these resources would be in a variety of formats requiring, as possible, transformations into trees or graphs or other techniques for using selectors. As @Crissov indicated, the variety of formats for metadata inside of media resources (images, audio, and video) include JSON, RDF, XML, flat key–value lists, and these nested in one another.

Meanwhile, Web developers might want for metadata-based selection to be simultaneously interoperable with:

  1. document metadata expressed using <meta> and <link> elements in documents' <head> sections,
  2. attribute-based metadata (using something like an mrsc attribute),
  3. metadata embedded in media resources,
  4. RDFa and microdata technologies.

I added an RDFa section to the proposal with a sketch of these ideas.

AdamSobieski commented 8 months ago

Also, with respect to existing document-based metadata techniques which utilize <meta> and <link> elements in documents' <head> sections, the metadata can be considered as being attached to the html element.

I updated the proposal by adding a section on document-based metadata.

AdamSobieski commented 8 months ago

@marcoscaceres, @Crissov, thank you both for the feedback thus far. The proposal has improved as a result of it.

For discussion,

JavaScript

Here is an API sketch showing the four scenarios:

var graph1 = document.documentElement.getMetadata({ kind: 'document' });
var graph2 = document.getElementById('math123').getMetadata({ kind: 'attribute' });
var graph3 = document.getElementById('img123').getMetadata({ kind: 'c2pa' });
var graph4 = document.getElementById('span123').getMetadata({ kind: 'rdfa' });

Alternative Metadata-based Selector Syntaxes

I updated the proposal to use the following syntax option which, in particular with query variable binding, is more expressive:

:meta(s, p, o)
:meta(s1, p1, o1):meta(s2, p2, o2)
:is(:meta(s1, p1, o1), :meta(s2, p2, o2))
:not(:meta(s, p, o))

There is at least one other syntax possibility:

:meta(triple(s1, p1, o1), triple(s2, p2, o2), triple(s3, p3, o3), ...)

Metadata-based Selectors, Scripting Functions, and Remote Services

What if Web developers could invoke JavaScript functions in a manner interoperable with CSS selectors?

In the following example, a JavaScript function, fun, would receive a document element, elem, and return a Boolean.

p:call('javascript:fun') { color: green; }
function fun(elem)
{
  ...
}

What if one could do so with elements and their metadata graphs?

In the following example, a JavaScript function, fun, would receive a document element, elem, a metadata graph, graph, and return a Boolean.

p:call-meta('javascript:fun') { color: green; }
function fun(elem, graph)
{
  ...
}

Then, one could express concepts like:

math[msrc]:call-meta('javascript:isProofValid') { color: green; }
math:not([msrc]) { color: yellow; }
math[msrc]:not(:call-meta('javascript:isProofValid')) { color: red; }

And, perhaps, concepts like:

math:call-meta('https://www.example.com/proofvalidator.php') { color: green; }