Open whyrusleeping opened 5 years ago
ref #179
Filecoin miners may at times accept a storage deal from a client for data that is deemed illegal in their jurisdiction. In order to comply the with law, a miner who agreed to store the bad data must not serve it, and delete the data from its hard drive as soon as possible.
One approach is to have miners put forth a set of entities that are allowed to serve as arbitrators of which data is considered bad. If the client agrees to such a deal, and then later the arbitrator says that the data is bad, then the miner may remove the data without penalty.
The actual mechanisms for how this might work are tricky. If the miner removes the data during a PoSt, then the proof would need input from the arbitrator to prove that the data was removed in accordance to the agreements made in the deal. This significantly complicates the proofs.
Another approach would be for miners to simply specify removed sectors at the end of a proving period like any other sector removal. Then the client responsible for the bad data (or any other actor acting on their behalf) could try to slash the miner for those sectors. Since early deal removal penalization is a non-interactive protocol (by necessity) this won’t work. So instead of having the miners remove the bad data in the standard way, we could add an extra parameter to the PoSt submission for ‘abitrated removals’. Any sectors specified as removed by this list would have to be looked up on chain, meaning the arbitrators would have to keep up to date information about their block lists on chain. This maintenance could potentially be very expensive. It also serves to keep information that could be used to look up the bad bits online for all time. The volume of these block list updates could be really high, leading to scalability problems for the chain and also significant cost for the arbitrators maintaining these lists.
In addition, looking up sectors in these lists may be an expensive operation, increasing already high gas costs for PoSt submission.
To offset costs for the arbitrators, miners could pay fees to them each time they do an arbitrated sector removal (though, that fee must be below the collateral fine for sector removal, otherwise miners will just drop the sectors outright and pay that smaller fee).
Another issue is around changing the arbitrator set used by a miner. Since clients agree to use a particular arbitrator, miners must use only that arbitrator for as long as they have deals with that client. Since these deals aren’t actually on chain, we either have to prevent miners from changing their selected arbitrators, or have some way for clients to complain about the validations changing. A different approach could be to allow miners to select different arbitrators, but require they specify which arbitrator was used when reporting bad data removal. Clients then need to be able to slash miners for removing data with respect to an arbitrator that they didn’t agree to. This is complicated.
Another approach would be to have miners sign up for ‘fraud insurance’ from some insurance provider. This insurance provider would pay out to the miner if the miner ends up storing bad data for a client. In exchange, the insurer gets to mandate which clients the miner is allowed to accept deals from. The miner will also need to pay some sort of subscription fees to the insurance entity.
The upside to this approach is that it does not require any on-chain action aside from the payments involved.
The (somewhat obvious) downside, is that the insurance entities will likely require some form of KYC on their customers. This may come in the traditional form of identity verification, or alternate forms, such as a slashable deposit from the client (increasing client costs), or a trust graph of some sort (quite complicated to build).
Miners could also choose to operate without any insurance, absorbing the risk onto themselves, for the potential extra profit.
This approach is also unfortunate because it makes it more difficult to run a reputation system for miners. Differentiating between genuine data loss, and data being removed for legal reasons seems pretty important for determining which miners might be the most trustworthy.
The ‘insurance’ approach is drastically more scalable, but presents a worse UX for storage clients. It also protects against miners being able to arbitrarily drop data for free.
The blocklist/arbiter approach shifts the cost to miners, either directly or indirectly. The client experience is identical (no potential KYC involved). Several scalability challenges crop up however, as well as some issues around slashing and arbiter agreements. In addition, it may create a market for illicit data. Since the block lists must be posted to the chain, and they must reference sectors directly, it becomes easy to build a list of the bad data, attempt to fetch it, and offer it up for retrieval (if, for example, the data is not illegal in your jurisdiction, or you don’t care). This is especially problematic if data is added to a blocklist that is not respected by any of the miners with that data.
@ianjdarrow a few questions:
There's a piece I don't understand well: you mention that "arbitrators would have to keep up to date information about their blocklists on chain." But there's no meaningful review of arbitrators' additions or removals to those lists. Strictly for purposes of permitting removals, doesn't a signed message from an arbiter get it done with a lot less on-chain overhead?
(There are definitely other good reasons to preserve lists of "deleteable" material with respect to a given arbiter, but we may be able to separate that piece in a way that makes scalability easier.)
Responding to your questions:
I think it's OK, but not desirable. Trying to think of a situation where bad content would become "not bad," or otherwise stale, over time, and coming up blank. It does seem like there are cases where blocklist size could grow really large over time – if that's a related concern, let me know if I can help think through it more.
In practice I would expect the number of block lists & the range of topics covered to be dramatically less if it's not possible to be paid to maintain one. I don't think we should expect governments in particular to provide this service, especially early – governments are not early adopters – but we expect nonprofits and other interest groups to maintain a small number of really important ones.
The answer to this would vary a lot between jurisdictions. My theory is that, by giving folks a pretty easy way to remove content they don't want on their machines (obviously while being fair to storage purchasers), we can help keep miners safe by moving the process closer to a routine DMCA takedown.
I think a blocklist maintainer is performing a fundamentally economic service, and it's miners' jobs to pick a reliable blocklist maintainer. The miner is still ultimately responsible for deleting bad data from their own machine – the blocklist maintainer provides a service that might make it it less economically painful. (It's clearly impossible for a blocklist to ever be 100% complete, and the improvements we get will be relative, not absolute).
@ianjdarrow
Strictly for purposes of permitting removals, doesn't a signed message from an arbiter get it done with a lot less on-chain overhead?
you mean a signed message directly from the arbiter to the miner in question? You could do that, but it still has to be tracked in a list of some sort on that miner after the message is submitted. Also, if the bad client stored the data in question on multiple miners, then the arbiter has to send one message per miner that it was stored with, which is potentially expensive (especially if the client knows their data might get removed this way, they could store the exact same data with as many miners as possible as a way to 'attack' the arbiter). Plus, if no list of items is kept, whats to prevent the miner for immediately re-accepting the a deal for that data?
But, in any case, one message to the chain per removed item has the potential to get pricey.
It does seem like there are cases where blocklist size could grow really large over time – if that's a related concern, let me know if I can help think through it more.
Yeah, my main concern here is what can we do to prevent this list from growing infinitely?
you mean a signed message directly from the arbiter to the miner in question?
This may reveal my limited knowledge of the spec, but what about something like:
The arbiter attestations happen at the content level, not the miner/contract level. Maintenance of these attestations can happen off chain. Miners should maintain a list of bad content off-chain as well – in practice I think the lists would be retrieved from arbiters. Miners can filter storage bids through these lists during the bid/ask negotiation process.
It's actually not crucial that a list exists at all, though I think still desirable, but in any case, list size should not be a significant issue if this approach works. A high-level, incomplete fix: the "persistent" list filters only the [xx]% most common bad content, but you can still get a signed attestation for one-off bad content to remove it. An imperfect but working & scalable fix might be perfectly acceptable here.
I assume this approach may have some incorrect assumptions about how filecoin proofs work – if that's right, happy to discuss, or feel free to point me to things I should read to understand better.
At the time the storage contract is entered, we record an association between the contract and the arbiter.
Record on-chain? We cannot do anything per deal on-chain due to scaling concerns.
To satisfy proof requirements, any miner can provide either...
Yeah... technically thats borderline impossible. In order for the proofs to still validate, they would need to include the information about the removed content on-chain so that everyone else could verify. Basically, anything that is required to tell that the proof is valid must go on-chain, or be 'proved' in some way. We could hypothetically do something with a SNARK here, but the additional complexity there would be fairly absurd. We're already working pretty hard to bring the complexity of that particular proof down to the point of being practical.
@ianjdarrow on a scale from 1-10, how badly do we need to launch with this functionality?
@whyrusleeping let's set up a time to quickly chat sync – I want to match my 1-10 scale with yours in the context of other launch functionality that's on the bubble. Can you drop something on my calendar? I'll work around you.
@ianjdarrow I chatted with Marvin yesterday, and we decided that if there is not time to come up with a good scalable solution to storage miners removing data, then it should be sufficient (in the short term) for retrieval miners to respect an offchain double-hashed blocklist.
Making it so that miners can choose to not serve data deemed illegal in their jurisdictions is a pretty reasonable safeguard for now.
Thoughts?
I think this is promising, especially as a short-medium term fix. The tooling would also be similar or identical to the tooling for storage miners to conduct pre-contract screens, which is great and will reduce duplicative work.
Two questions – a "yes" to the first one would be really helpful; the second would be nice but we might be able to make a good argument around it.
Storage miners can absolutely decline retrieval requests, at any time, for any reason.
Storage miners have to be able to access the data themselves for the purposes of the PoSt, but since the PoSt is a zero knowledge proof, none of that data ever leaves the miner. The PoSt simply proves to everyone else that they actually have the data, without revealing the data (thats how we keep it compact).
OK, great. Let's proceed assuming this can work.
NB: it's still imperfect and we will have to make a little bit of a subtle argument around storing, but not serving, undesirable content. But in the context of a file storage network, the social harm of bad content comes from distributing it, so I think we are OK from a commonsense perspective. I'll think more about what non-breaking refinements we can make to further improve.
@ianjdarrow Would you also be able to add this in an "Implementation Guidance" section?
If content needs to be removed from the network, miners should be able to have a way to comply that is better than just 'drop the data and lose your collateral'.