Content Provenance and Authenticity

AdamSobieski commented 1 year ago

Introduction

This is a proposal for adding content-authenticity-related features to the informational panels which appear when end-users click upon HTTPS lock icons or newer icons, e.g., "tune" icons, in their Web browsers' address bars.

The Problem

Misinformation and disinformation are prevalent online and acutely so during election seasons. Concurrently, an increase in the popularity and the availability of generative artificial intelligence technologies has raised related concerns.

A Solution

A 2022 report by the Royal Society of the United Kingdom [1] indicates some countermeasures with which to mitigate misinformation and disinformation online.

automated detection systems (e.g., to flag or add context and resources to content)
emerging anti-misinformation sector (e.g., organizations combating scientific misinformation)
provenance-enhancing technology (i.e., better enabling people to determine the veracity of a claim, image, or video)
APIs for research (i.e., for usage to detect, understand, and counter misinformation)
active bystanders (e.g., corrective commenting)
community moderation (usually of unpaid and untrained, often independent, volunteers)
anti-virals (e.g., limiting the number of times a message can be forwarded in privacy-respecting encrypted chats)
collective intelligence (examples being Wikipedia where multiple editors refine encyclopedic articles, and question-and-answer sites where outputs are also evaluated by others similar to peer-review)
trustworthy institutions and data
media literacy (increasing citizens' ability to use information and communication technologies to find, evaluate, create, and communicate information, an essential skill for citizens of all ages)

Informational panels which open when end-users click upon address bar icons have previously contained only connection-security-related information and site controls. This proposed solution involves adding automated detection, anti-misinformation, provenance-enhancing, and fact-checking features to a more general-purpose informational panel.

More general-purpose informational panels could present content-authenticity-related information to end-users including, but not limited to, whether:

a webpage describes itself as being a news article (e.g., using Web schema)
news articles were digitally signed
digital signers that were publishers were news organizations
digital signers that were contributors were journalists
the images and videos in webpages were described as being authentic (e.g., not deepfakes)
images and videos could be digitally verified as being authentic
any important factual claims made could be checked, verified, or described

This proposed solution would equip end-users to better detect and mitigate misinformation and disinformation as they encountered it and, in an overall sense, would enhance collective intelligence and media literacy.

These features could be provided for end-users via their browser software or via browser extensions. Extensibility with respect to address-bar informational panels, or certain menu subtrees thereof, would enable an emerging anti-misinformation sector and a consolidation of user experiences. That is, end-users would learn to click upon the icons in their address bars to obtain connection-security information, content-authenticity-related information, and access to site controls.

Cryptography

This proposal involves public key infrastructures, digital signatures, and existing solutions for multimedia provenance and authenticity, e.g., Coalition for Content Provenance and Authenticity metadata, which utilize these technologies.

Also, the related technologies of decentralized identifiers and verifiable credentials are potentially applicable.

Sociology

Public key certificate issuers could determine their own policies with respect to issuing and revoking certificates to news organizations and journalists, e.g., "digital press passes".

A certificate issuer could, for instance, have a policy involving that certificate holders agree to abide by a professional code of conduct, for their certificates to remain valid.

Issuers' policies would contribute to the assurance of quality that their digital certificates would signify. Issuers' policies would ensure and assure trustworthy institutions and data. As envisioned, news organizations and independent journalists would be able to choose from a number of competing certificate issuers including based on their policies.

Incentives

Why would news organizations and independent journalists choose to adopt and utilize solutions like the one proposed?

Search Engine Optimization

News organizations and independent journalists could be encouraged to adopt new best practices via search engine optimization. That is, verifiably journalistic content (as opposed to news satire or other types of news-like content) with verifiably authentic multimedia (as opposed to potential deepfakes) could be prioritized over other content by search engines and news aggregation services.

Social Media

Social media websites' download and process webpages' contents when they are shared to obtain metadata, e.g., OpenGraph, with which to produce rich objects for display. Social media websites could additionally process shared webpages' contents - at least popular content - for content authenticity.

Social media websites would be able to visually distinguish verifiably journalistic content from other forms of Web content for end-users. Verifiably journalistic contents' rich objects could be decorated with "news checkmarks".

Privacy

As envisioned, content-authenticity-related components process end-users' documents' contents in an on-demand manner, when end-users click on address-bar icons to open the informational panels or, perhaps, when they navigate to a certain menu subtree thereof.

However, if address bar icons are desired to visually indicate informational, warning, or error messages to end-users (e.g., by changing color to yellow or red), such features being part of automated detection systems, then content-authenticity-related components might, instead, process end-users' documents' contents after webpages were loaded and displayed.

Determining whether webpages utilized news article Web schemas would not reveal any information to third parties. Determining whether webpages' contents were digitally signed would not reveal any information to third parties. Verifying digital signatures and determining whether publisher-signers were news organizations and whether contributor-signers were journalists could reveal information to third parties. Determining whether webpages' multimedia resources declared to be authentic would not reveal information to third parties. Verifying such multimedia content as being authentic could reveal information to third parties. Fact-checking features could reveal information to third parties.

The data potentially revealed to third-party service providers, for these scenarios, are: publishers of consumed articles, journalist contributors of consumed articles, those articles consumed, and some contents, e.g., factual claims, from those articles consumed.

End-users should be able to toggle whether content-authenticity-related features are enabled. Settings could include: "off", "only when icon clicked", and "on". For in-private browsing, content-authenticity-related features could be other than "on" by default.

Conclusion

This was a proposal for adding content-authenticity-related features to the informational panels which appear when end-users click upon HTTPS lock icons or newer icons, e.g., "tune" icons, in their Web browsers' address bars.

Thank you. I look forward to discussing these ideas with you!

References

[1] The Royal Society. The online information environment: Understanding how the internet shapes people's engagement with scientific information. 2022. [PDF]

lrosenthol commented 1 year ago

@AdamSobieski thanks for raising this.

As chair of the Technical Working Group, let me add a few notes...

The C2PA has always believed that coordination with the W3C (and other standards bodies) on the solution to the problems of misinformation/disinformation is necessary. That is one reason that I have been actively working with numerous WGs/CGs for almost 4 years now to ensure that our technology leverages the standards and that folks are aware of our work and integrate it accordingly.

Our specification, currently at version 1.3, uses existing technology throughout (as opposed to the introduction of something new and different). Identity is centered around Verifiable Credentials (which may or may not include DIDs), metadata uses existing standards (XMP, IPTC, EXIF, Schema.org, etc.) and our trust model is the same as the web (X.509 certs).

I look forward to continued coordination of our work and that of the Web community to make sure that users are able to see the provenance of the assets that they consume online and make informed decisions about whether they choose to trust them.

Pandapip1 commented 1 year ago

I strongly oppose the naming C2PA, as I feel that the use of the word "authenticity" here misleads users. What this really provides is cryptographically-secure attestations, and the name should reflect that.

WICG / proposals