filecoin-project / specs

The Filecoin protocol specification
https://spec.filecoin.io
Other
369 stars 170 forks source link

Rationale against burning pledge collateral for storage faults #407

Closed anorth closed 4 years ago

anorth commented 4 years ago

I believe it is desirable for us to develop proof and storage market mechanisms that avoid penalising pledge collateral in the case of plausibly-accidental failures of storage by miners. I think a consensus around this is now emerging (e.g. in https://github.com/filecoin-project/specs/pull/403) but I'm writing this up so we have something to refer to in the future when/if this is questioned.

I am assuming that pledge collateral represents a significant financial lock-up: in the early life of the network, and depending on FIL price, potentially much higher than the cost of hardware for a given storage capacity. Having this at risk in the case of operational failures, which are inevitable, represents a very large risk to a prospective miner – maybe enough to make participation a too-risky financial proposition.

A miner may fail to prove storage of one or more sectors for a large array of real-world reasons, from drive and machine failures, down to cosmic ray bit-flips, and up to datacentre network and power outages and Filecoin network partitions. These are all events that will happen to some miners.

In the case of permanent loss, e.g. of a hard drive, if the costs of that loss are too high, miners will replicate their data to mitigate the risk of losing more than the value of that drive. This will force storage prices up and proven storage down from where they might otherwise be.

At large scale, the risk becomes more about getting a PoSt message onto the network, even when all storage remains available. A datacentre network cable cut or power outage could easily take a datacentre offline for a whole proving period, and the idea that this could cost the miner more than the capital cost of that datacentre - despite actually maintaining petabytes of committed storage, is a bit outlandish. This too could drive very complex cross-DC replication strategies that suffer diseconomies of scale.

From a market perspective, we'd prefer storage clients to make choices about reliability vs cost of storage (through replication and/or negotiated storage collateral (#386)), but if miners have strong non-market reasons to offer only extremely-high reliability, then other options will disappear from the market.

Burning pledge collateral for faults introduces a very sharp edge between a miner deciding ahead of time to decommission a sector, and doing so involuntarily due to a hardware failure. It also raises short-term incentives to censor PoSts from other miners, especially as they arrive close to the deadline. If too extreme, it might even incentivise physical sabotage of competing miners' operations. It's a sharp divergence from the familiar mental model of POW mining, where the worst outcome of an operational failure is lost block rewards, and no potential to lose more than the up-front capital investment in a single moment.

The chain can't tell the difference between malicious and non-malicious failure to prove storage, but treating all failures as malicious and penalising them heavily makes participation extremely high risk for miners, and they'll either need to invest heavily in expensive security, replication and disaster-recovery mechanisms (raising both prices and barriers to entry) or just not participate – neither are good outcomes. Miners need to know that pledge collateral is safe: they will never lose it unless they deliberately misbehave. This is especially doubly true if we expect large FIL holders to lend FIL to miner operators as a stake.


There remains an unresolved desire to incentivise long-term stability of committed storage, especially when it is holding client data. I don't think penalisation of pledge collateral achieves that: it only introduces a 1-proving-period delay for a miner to declare a sector done. IMO market-based storage deal collateral should be a primary incentive here, though I could be convinced we need more.

sternhenri commented 4 years ago

There is a lot above, and it may be worth a conversation to cover point by point, but a quick answer on my end wrt my own understanding of the protocol, sous tutelle de @whyrusleeping @dignifiedquire.

As is, a miner's consensus power is cut immediately after a storage fault, and their pledge collateral cut (with an ability to recover at first); however the pledge collateral itself is only tied to provable faults dealing with Nothing-at-Stake, though the ramifications of #403 and the surrounding conversations may need to be finalized.

zixuanzh commented 4 years ago

Done, pledge collateral is only slashed for consensus fault. Miners need to pay a TemporaryFault fee for storage faults.