Discussion: Provider Removal of Content & Transparency

LibertyDSNP / spec

The DSNP Spec and Website

https://spec.dsnp.org

Other

30 stars 3 forks source link

Discussion: Provider Removal of Content & Transparency #184

Open wilwade opened 2 years ago

wilwade commented 2 years ago

Situation: A delegated service has previously announced some content that it now determines it does not want to continue hosting. What happens?

Previous related discussions: #80 (More user request rather an service required)

Off-chain announced content have three responsible parties (could be the same or have splitting out such as a host paid for by someone else):

End-user
Host of the data
Service that submits the Announcement to the network

HTTP Status codes work in some situations:

Server shut down: 404
Moved Content: 30x
Removed for legal reasons: 451

Difficult situation: What about just terms of service violations?

404 feels like not enough data. Not very transparent
451 feels too strong and not a good match for ToS violations

General Approaches

Some form of DSNP level announcement
Something in the spec, but off-chain in the response such as in the body of the 404 response
Nothing but a normal 404

Some form of DSNP level announcement

Why?

Transparency
Use by others
Reputation

Nothing but a normal 404

This isn't controlled by the spec, but the hash would fail if it is removed.
Effectively the same as off-chain in the response such as in the body of the 404 response (just not in the spec)

Summary from 2022-04-28:

404 or 404-like (aka hash failing) responses are fine for now.
Services may choose to add additional data into the 404 body
Later we can reconsider either the DSNP Announcement needs OR standardizing a 404 body message

wesbiggs commented 1 year ago

I think to solve this problem we need to take a step back and look at the centralization chokepoint caused by requiring the HTTPS scheme for URLs for Activity Content. This takes control of user-generated content out of the hands of an individual user and into that of the entity (probably the provider) that controls the hostname in question. A way of addressing this is to enable consumers to attempt to retrieve content from a decentralized storage protocol such as IPFS.

If the spec is changed to enable IPFS as a top-level scheme for content retrieval, then the user (or any other entity) has the ability to host content in any location they choose. Any given provider/host may cease to make the content available at any time (in general, they should be encouraged not to—but services shut down, or they might find content that violates their terms of service, or they might receive legal notice requiring them to take down content, and so on). The only reliable safeguard from a user point of view is to keep a backup of their content so that it can either be made available elsewhere (ideally redundantly).

In this scenario, provider removal of content would simply mean unpinning the content in question. This would likely be coupled with a blocklist that is either provider-specific or shared between providers. (nb. Attestation—via DSNP Content Attribute Set Announcements—provides an in-protocol means of flagging content publicly, but blocklists could also be private or out-of-band.) The blocklist would tell providers not to retrieve or show the content in question, even if it exists (i.e. is pinned) elsewhere in the network. An unfiltered view of the content stream would still show the content, provided it is still pinned somewhere.

This brings up a second concern, which is a provider's ability to post Tombstone Announcements in such a scenario. Because DSNP treats tombstoning as final (which is consistent with the intended semantics), there exists a situation where a provider, however trustworthy, might be compelled (i.e. by force of law) to publish a Tombstone Announcement for user content. Note that this is a step further than unpinning, as it would have the effect of disallowing any protocol-compliant application from accessing the content. (A non-compliant application could potentially find the content, but this would get rather confusing.)

The only way to mitigate the threat of unsanctioned provider tombstoning of content is to not delegate permission to publish Tombstone Announcements. This is eminently possible with DSNP as is, but if we imagine an application that does not have this permission, and a user that wants to delete (tombstone) their content, we have now introduced a cost to the user for removal of content, even though (typically) the provider would bear the costs for content publishing. So we find that delegation of tombstoning is convenient but dangerous.

To propose a (slightly ridiculous) solution, the user could perform the following sequence of actions when they want to remove content: 1) Delegate Tombstone permission to the provider (in the provider app if they provide a user interface to do so, or in a separate wallet app); this requires a signature. 2) Delete the content from the provider app. 3) Remove the delegation (again, potentially in another app). Another signature is required.

This is both a horrible user experience and (however briefly) delegates more power than needed to the provider (the latter is probably not a major concern). An alternative is to change the tombstone announcement structure to always require the user's signature. This is still a potentially interruptive user experience, but hopefully the ability to request signatures from a user's control key becomes more standardized over time. There may not be a solution which maximizes both ease of use and user control.

wesbiggs commented 1 year ago

To summarize the above, the questions are:

Should we make content addressing core to announcing and retrieving content to avoid provider control of published content? If so, how?
Should we require an additional user signature for Tombstone Announcements, so that providers can only tombstone content if they have active user cooperation?

sbendar commented 1 year ago

I guess I viewed content as slightly different than the social graph. If you use an app to publish your content and they are paying to publish and host it, it is more of a partnership in that either party should be able to decide they no longer want to have the content published.

The user still owns their content from an IP point of view and could repost it somewhere else if they desired or host it themselves even.

wesbiggs commented 1 year ago

The user still owns their content from an IP point of view and could repost it somewhere else if they desired or host it themselves even.

They would have to re-announce it though, and the new timestamp/block number of the content wouldn't match the original, which I see as a key trust mechanism when reading a thread. (It would also be weird to see the gaps caused by the original provider making the content unavailable.)

But maybe I'm overthinking this, and very few services will go to the originally announced URL to get content, preferring to sync with a cache somewhere else. I just find it troubling that we can't guarantee content permanence.

shannonwells commented 1 year ago

To clarify for readers, it's not up to DSNP to guarantee content permanence**; only to provide a mechanism that allows a hosted URL to change out from under a decentralized indicator, to handle content that gets moved.

I'm in favor of an option for IPFS or a Torrent or something like that.

However Providers need the option to Tombstone user content without user permission, in the cases you mention, to more effectively combat abuse and illegal activity and potentially to able to comply with federal laws. Perhaps certain other limits could be discussed, such as, (brainstorming)

a Provider may not Tombstone unless they are a current delegate, or were a delegate at the time of posting
Content may not be Tombstoned without user signature without providing some publicly accessible proof of or description of a violation of law (quoted/linked) or Terms of Service (quoted/linked)
- this hints at adding a field to Tombstone that requires an additional link to the reason(s) for Tombstoning when not delegated

This could first of all be used to demonstrate compliance with the law, and secondly partly addresses a frequent complaint of users being having content removed with no explanation as to why nor having any recourse.

** as clearly such guarantees are completely undesirable for illegal content, such as CSAM

wesbiggs commented 1 year ago

To clarify for readers, it's not up to DSNP to guarantee content permanence**; only to provide a mechanism that allows a hosted URL to change out from under a decentralized indicator, to handle content that gets moved.

Yes, sorry if that wasn't clear. It should be possible (for example) for a user to switch their provider and take their content with them.

I'm in favor of an option for IPFS or a Torrent or something like that.

The nice thing about IPFS is that the contentHash field in the DSNP announcement should be directly convertible to an IPFS CID in most instances.

However Providers need the option to Tombstone user content without user permission, in the cases you mention, to more effectively combat abuse and illegal activity and potentially to able to comply with federal laws.

Which country's laws? If DSNP is to be used globally, it is probable that some content that is legal in one jurisdiction is illegal in another. I think "collaborative unpinning" should be used instead of tombstoning. If content is illegal, the feds (or whatever powers that be) can go after the hosting providers in their jurisdiction.

Perhaps certain other limits could be discussed, such as, (brainstorming)

a Provider may not Tombstone unless they are a current delegate, or were a delegate at the time of posting

I think the second part is going to be true by definition (except in rare cases where no provider is used?). The first part is the current specification. Maybe "and" instead of "or" would be interesting to consider (provider must be both the publisher of the announcement in question and a current delegate), but I still think this gives too much control to the provider. It's the user's content; the provider has agreed to host it (possibly for a limited time); the provider can stop hosting it, but it doesn't become the provider's content.

Content may not be Tombstoned without user signature without providing some publicly accessible proof of or description of a violation of law (quoted/linked) or Terms of Service (quoted/linked)

this hints at adding a field to Tombstone that requires an additional link to the reason(s) for Tombstoning when not delegated

This could first of all be used to demonstrate compliance with the law, and secondly partly addresses a frequent complaint of users being having content removed with no explanation as to why nor having any recourse.

I think attestations would be a way of providing this metadata (and suggesting to others who share the same jurisdiction or terms that the content should be unpinned and blocked). But anything that allows a provider to unilaterally tombstone content assumes that all providers will agree on their ruling. This might be true in certain circumstances (CSAM, perhaps, though even there, I'm not sure there is a single universal definition that all jurisdictions would agree on) but it is little more than an opinion in others.

To take a different example, let's say discussion of gender-affirming care becomes illegal in Alice's provider's jurisdiction, but not in Bob's, but both Alice and Bob have posted extensively about it in the past. Both providers have chosen to pin both parties' content, perhaps fearing this day would come. Alice's provider must unpin all this content to be legally compliant. But should Alice's provider be able to (effectively) force Bob's provider to not show Alice's prior posts, even though they are legal in Bob's jurisdiction and hosted on Bob's provider's server?

wilwade commented 1 year ago

A few adjacent notes specifically to separate out the two issues.

Issue 1: How much authority do delegates have to tombstone or remove content?
Issue 2: How does a user who wants another delegate or system to host their content do that?

For Issue 2, if Alice's wants to have their content hosted elsewhere, there are several levels required:

The Batch Layer: The Frequency DSNP System requires that batches be published via IPFS. So every batch, while the original delegate application may want to continue pinning the batches, it could be shutting down or have a batch with only Alice's content.
The Activity Content Layer: This content is possible to do on IPFS via the IPFS ipfs.io url structure, but might not be.
The Attachment Layer (aka images, media, etc...): This is also possible to do on IPFS as above, but again might not be.

Now while something such as an IPFS setup can be suggested and built into tools, some delegate applications may feel they cannot participate in the posting of such complete content (as opposed to just metadata in the Announcements) to IPFS.

I do not think that requiring IPFS (or other decentralized storage) top to bottom is a viable option at this time.

So we have a few alternatives that have been discussed at various times:

(Possible now) Use an Update Announcement that republishes as an Update instead of as a complete republishing which then maintains the originally posted block time. (Note this currently requires a new hash) This could also be done in one single large Batch.
Create a new Announcement or a v2 or Update specifically to cover the updates of this type
Allow there to be an Announcement that provides a translation from Delegate A to Delegate B urls
Services could use the DSNP Content URI instead to cache and make available the data via that identifier

I'm sure there are others, but this is a different issue than the delegate authority issue.

For issue 1, I believe it is reasonable that a delegate who has published something to the world (on behalf of the user) can have it removed. Currently that is via a tombstone (if the user has delegated it) or via refusing to continue to pin/host it.

I'm sure different legal jurisdictions have different requirements around this, but I ask if the permission to tombstone that a delegate might have is any greater than the permission to publish?

The primary thing that is lost is that one cannot untombstone. This is necessary for purposes of recursion limitation. While the spec could have a "Republish" or "Untombstone" announcements, I ask if the added complexity is worth the benefit?

harry-evans commented 1 year ago

I think this points to several distinct “states” content could be in when it is “removed”:

The post is tombstoned. With a tombstone, content probably should also no longer be hosted, though those are separate operations.
The post is not tombstoned, but the provider has decided to no longer hosts the content. However, the content is hosted by someone else (for example, still pinned by someone else in ipfs)
The post is not tombstoned, and either no one else can (in the http(s) case) or didn’t (in the ipfs case) host it somewhere else.
The post is not tombstoned, is still hosted somewhere (by the provider or someone else) but is part of a deny list used by one or more clients.

I am sure there are many more variations.

The most important elements from my perspective: a. Tombstoning is a “semantic” deletion. It doesn’t modify the batch that announced the content, and does not necessarily cause the content to stop being available (though that is often the case). It also doesnt “hide” content. If anything, it calls attention to the post. This also leads me to conclude the “posting” permission is less sensitive to grant, then the “tombstoning” permission, since it represents the user censoring themself, not external moderation, TOS issues or regulatory concerns.
b. I can’t tell why a provider would tombstone content that they had a policy or regulatory issue with. They would simply add it to an (internal) deny list, not serve it in feeds and stop hosting the content. Why would they explicitly tombstone it, given that that involves them putting it in a a second provider hosted batch file? c. Given the above, is there a reason for “normal” providers to have the tombstoning permission? I could see a wallet like interface offering it maybe (similar to delegation revocation). Perhaps apps have this permission, but it’s a “should not” use it except per user instruction? That seems weak, but avoids the signing dance for deleting a post, though I think that deletion is uncommon enough that having to actually sign the request is ok.

This leads me to ask whether we should (perhaps) always have a content hash in the announcement, with an optional provider url? If the provider url is blank or invalid, the fallback would be to check ipfs, which would give us a robust “reposting” mechanism. Alternatively maybe the original user (directly or via delegation) can post a special “reply”/“repost” to the message that acts as a redirect to a new location (requiring content to pass the security checks of the original announcement)? If the user controls their content (as we claim), then being able to declare the (same) content is somewhere else (given we aren’t using content addressed storage) is the right approach?

Feels like a good discussion for the next DSNP Spec meeting, maybe?

wesbiggs commented 1 year ago

(Possible now) Use an Update Announcement that republishes as an Update instead of as a complete republishing which then maintains the originally posted block time. (Note this currently requires a new hash) This could also be done in one single large Batch.

(emphasis mine) The new hash requirement isn't spelled out in the DSNP spec. Is it necessary? If updates with the same hash are allowed, I think it would be reasonable for applications to conclude that content had been moved and not edited, which is important for transparency in the UX. This is a reasonable solution but doesn't scale particularly well.

I also think there's a substantive difference between a user proactively vs. reactively replicating their content (I would like to solve for the proactive case). If moving content requires an Update Announcement, there are scenarios that leave a lot of question marks. For example, if someone dies and their control key is not recovered by their estate, and at some later point their provider goes out of business, there is no protocol-native means of preserving their content within the DSNP content corpus. I know these are edge cases, but there are a lot of scenarios where user-driven reactive updates are a poor fit for continuity of access to "public square" communication.

Services could use the DSNP Content URI instead to cache and make available the data via that identifier

I like this conceptually, but from a consuming application's point of view, there should be a clear algorithm for (attempting to) access a content item, and this feels very hand-wavy—try the centralized URL, but if that doesn't work, use whatever knowledge and protocols you may or may not have at your disposal to try to locate the content by its hash. This seems likely to lead to inconsistent behavior, where content is visible within one application but not in another (which is allowed from a filtering/terms of service point of view, but that's not really the case here).

Now while something such as an IPFS setup can be suggested and built into tools, some delegate applications may feel they cannot participate in the posting of such complete content (as opposed to just metadata in the Announcements) to IPFS.

I get that IPFS is still a pretty exotic concept to providers in the Web 2.0 world, but is pinning on IPFS really that much different from hosting that same file on a public web server? I'm not suggesting that a provider be required to pin any other content (though of course they can if they want, and many will likely cache broad swathes of content as a matter of course). And as you point out, they're required to speak IPFS for batch files anyway (on Frequency at least).

Is there a best of both worlds solution? What if we propose the following algorithm for content retrieval:

A Consumer may attempt to load content either by its HTTP URL, or via IPFS by converting the DSNP Content URI into a CID.
If initial retrieval via HTTP fails, IPFS retrieval must be attempted. And vice versa: if initial retrieval by IPFS fails, HTTP retrieval must be attempted.

This approach requires batch consumers to be able to retrieve content over both protocols. But it enables a user to provide redundancy at any time by pinning the file in IPFS (most likely through a relationship with a storage service that needn't be DSNP-aware, such as a Filecoin-backed solution).

This would then take us back to the need to define what constitutes a retrieval failure in HTTP. I would submit that any response other than a redirect (within a reasonable number of non-circular hops) or a successful hash match should be counted as a failure. One can easily imagine a case where a defunct provider's domain is still serving HTTP 200s for every URL but the actual content is now a domain parking page.

I also think we should shift the IPFS requirement for batches to the DSNP spec (not just DSNP over Frequency).

I agree that provider tombstoning is technically an orthogonal topic, so I'll make a separate discussion issue for that.

Postscript: I wrote this in a parallel timeline to Harry's reply (i.e. on an airplane without an internet connection) and it looks like he made some of the same points, so apologies for the redundancy.