filecoin-project / FIPs

The Filecoin Improvement Proposal repository
312 stars 164 forks source link

Support non-deal data in sectors (off-chain deals part 1) #57

Closed magik6k closed 2 years ago

magik6k commented 3 years ago

Problem

Currently Filecoin sectors can only store deal data referenced by on-chain deals. CC sectors and non-deal areas of sectors must store null (\0) bytes to be verifiable on-chain. This limitation is caused by how the UnsealedCID (CommD) is computed in the miner actor.

Unfortunately publishing deals is a very expensive on-chain operation, especially for smaller pieces where this cost it can be the main part of the storage fee. Off-chain deal support / alternative markets (e.g. more scalable storage market on ETH2.0 with data stored on Filecoin) seem to be good solutions to this problem, at least for non-FIL+ deals.

This proposal is only the first, but major, part of what needs to be done to support off-chain deals. It's likely that some off-chain deal protocols will require additional actor methods to be implemented (for example a method to check if a sector wasn't terminated early for settling payment channels)

Current state

Currently, when precommitting sectors, a number of DealIDs can be specified with SectorPreCommitInfo:

// Information provided by a miner when pre-committing a sector.
type SectorPreCommitInfo struct {
    SealProof       abi.RegisteredSealProof
    SectorNumber    abi.SectorNumber
    SealedCID       cid.Cid `checked:"true"`
    SealRandEpoch   abi.ChainEpoch
    DealIDs         []abi.DealID    // ****
    Expiration      abi.ChainEpoch
    ReplaceCapacity bool
    ReplaceSectorDeadline  uint64
    ReplaceSectorPartition uint64
    ReplaceSectorNumber    abi.SectorNumber
}

This information is then stored in miner actor state, and used to compute UnsealedCID when verifying PoRep by turning specified DealIDs into []abi.PieceInfos, and calling the ComputeUnsealedSectorCID syscall through the storage market actor.

type abi.PieceInfo struct {
    Size     abi.PaddedPieceSize
    PieceCID cid.Cid
}
ComputeUnsealedSectorCID(reg abi.RegisteredSealProof, pieces []abi.PieceInfo) (cid.Cid, error)

Proposed solution

We can change or create a new version of the SectorPreCommitInfo miner actor struct, changing the DealIDs array to a new array which allows specifying non-deal PieceCIDs:

+type SectorPieceInfo struct{
+   PieceCID  *cid.Cid `checked:"true"` // CommP
+   PieceSize abi.PaddedPieceSize
+
+   DealID abi.DealID
+}

// Information provided by a miner when pre-committing a sector.
type SectorPreCommitInfo struct {
    SealProof       abi.RegisteredSealProof
    SectorNumber    abi.SectorNumber
    SealedCID       cid.Cid `checked:"true"` // CommR
    SealRandEpoch   abi.ChainEpoch
-   DealIDs         []abi.DealID
+   Pieces          []SectorPieceInfo

    Expiration      abi.ChainEpoch
    ReplaceCapacity bool // Whether to replace a "committed capacity" no-deal sector (requires non-empty DealIDs)
    // The committed capacity sector to replace, and it's deadline/partition location
    ReplaceSectorDeadline  uint64
    ReplaceSectorPartition uint64
    ReplaceSectorNumber    abi.SectorNumber
}

When computing UnsealedCID for PoRep verification, []SectorPieceInfo would be turned into []abi.PieceInfo for the ComputeUnsealedSectorCID actor syscall as follows:

(it's possible to save 1 byte per entry by changing PieceSize/DealID to a single uint64 value (DealIdOrPieceSize), and processing it based on whether PieceCID is null or not)

Discussion

State migration

Depending on implementation details, this proposal may involve a relatively major state migration. We should look into ways of limiting that.

Added overhead for deal data

Publishing storage on-chain deals already has multiple kilobytes of read/write overhead. Each abi.DealID entry is 5B (1B cbor header, 4B for integer data). Depending on implementation, a dealID SectorPieceInfo entry will be 7 or 8 bytes, which is a negligible difference

Related FIPs

pooja commented 3 years ago

I think this FIP seems like a good idea on its own! Although I have reservations about off-chain deals -- many use cases (and observability of the network) require deals being on-chain. Think debating the specifics of on vs off chain deals should probably occur in another issue (don't want to clog this FIP issue up with unrelated discussion), but are there other solutions we can investigate that help with the deal publishing cost concerns? E.g. I've seen some ideas around batching deals, which would definitely help

anorth commented 3 years ago

This seems promising. Some questions which would help further analysis.

Re FIP-0008, yes it makes a lot of sense to group those changes together, unless we really want the chain throughput gains from that more urgently. Currently I think that is held up behind us needing to batch prove commits first, which in turn probably requires proof aggregation, unless we figure out a way to regain parallel proof verification during chain evaluation.

Re FIP-0007, the PreCommittedSectors HAMTs are pretty small in the scheme of things and not difficult to migrate. There's not much to be gained by bundling there. There would be a much stronger case for bundling migration of the SectorOnChainInfo (see question above).

magik6k commented 3 years ago

some ideas around batching deals, which would definitely help

Batching only removes the flat cost of the publish message, which looking at publish messages already on-chain is only making things ~70% better, which means that even when batching many deals, the flat cost is still very significant

many use cases (and observability of the network) require deals being on-chain

There are other ways to achieve those things - (incentivized/authenticated) DHTs, putting indexes in Filecoin sectors, etc. It's just not possible to meaningfully index exabytes of data in a few GB of on-chain state

What are the other parts needed for off-chain deals going to look like? It's hard to analyse this as a piece of that solution without knowing a bit more about the rest of that solution. If that's the primary driver for this change, it might be best to bundle it together with the rest of the solution in order to avoid any unnecessary churn, especially in state schemas

Agreed about bundling with other bits of the off-chain protocol. The protocol would be based loosely on the old spec, using payment channels with vouchers contingent on piece being included in a sector (using piece inclusion proofs), which can be replaced for vouchers contingent on a sector not being expired after the sector is precommitted/proven, which after the deal is finished, those vouchers can be renegotiated for a voucher contingent on nothing / merged with vouchers from other deals

Not counting the need to put sectors on-chain, this improves needed on-chain interactions from O(nDeals) to O(1) in the optimistic case, possibly making deals in the MB range feasible

Will the piece CID need to be stored in the SectorOnChainInfo too? If not, why not?

I'd say no, because it doesn't need to be (for querying it's easy enough to go back in chain history to get that from precommit info).

Off-chain deals probably require piece inclusion proofs. Is anything else required to support them?

That and another method to check if sector wasn't expired before epoch X, with a guarantee that this info will be possible to get for some hours after the sector has expired.

whyrusleeping commented 3 years ago

On the added overhead bit, the cid and size fields can be put into a nullable subfield and then we would only be adding a single byte of overhead.

If that's the primary driver for this change, it might be best to bundle it together with the rest of the solution

I think this is useful on its own, even without the offchain deals. This allows miners to store arbitrary data in their sectors, allowing people who have a bunch of 'in use' storage already to commit the space to filecoin, putting their existing data inside of sectors. (granted, its unlikely anyone would do this at the moment given the costs involved, but improvements to PoRep will make this an attractive option down the road)

In general though, strong +1 to moving in this direction. More flexibility here will enable a lot of usecases.

zixuanzh commented 3 years ago

Agree with what has been said. The direction of this FIP is good on its own and Committed Capacity was never meant to be just randomness. On the economic design level, miners may have private utility with storing their own data in CC and that is totally fine since they are merely committing capacity to the network. There could also be some off-chain arrangement too.

I will just be careful with not losing CIDs on chain as much as possible. They are useful for indexing, retrieval, and potentially proving to other chains that some CIDs exist on Filecoin (which will be very useful for emergent Web3 behavior).

anorth commented 3 years ago

We've had some time to consider this more now, with related proposals and discussion from miners and other community. I am going to present a counterargument: why we should not attempt this.

Supporting non-deal data in committed-capacity sectors sidesteps the Filecoin deal market. Whether or not the goal is stated up front as enabling an off-chain deal market, that is what it will do. Any miner who commits non-deal data into a sector is doing an off-chain deal of some sort, even if with themselves. It's not committed-capacity any more, it's a storage market that is outside the Filecoin network's cryptoeconomic island and would probably be settled in fiat/stablecoin. The long-term health of the Filecoin economy depends on the innovation and growth happening within that economy, data about economic activity being transparent to the participants, and trade being settled in the native token.

It is very likely that any off-chain or alternative deal market could be more cost-efficient than the "official" one, at least in the short term. There's no argument right now that deal-making is expensive and that it's important for the Filecoin network that we work on developing more efficient marketplaces. In the short term we can mitigate costs with off-chain aggregation of deals, and the Filecoin Plus program provides a significant subsidy to early providers as we bootstrap this economy. In the mid and long term we need to develop much more scalable representation and upkeep of markets. Much of the work could indeed move off-chain, with techniques like state channels, rollups, ZK-proofs etc, but be tied back to a settlement layer in FIL.

As a supporting point, from a more technical point of view, the enforced zero data in CC sectors presents a valuable constraint that future development work can depend on. For example:

CC within Filecoin needs to remain that: capacity that is intended to be replaced/upgraded into useful storage deals. Sidestepping the on-chain storage market in the short term seems attractive on the surface, but in the long term undermines Filecoin development and the economy.

jbenet commented 3 years ago
ozhtdong commented 3 years ago

Would agree that filecoin as a whole package to include proposals, deals, settlement and proving would be a better thing in long term, with that consideration I don't have problem to close FIP16. And Neutron is truly a super cool thing to pursue.

Meanwhile I still believe that bringing more true data shall be considering as the priority for both PL team and community instead of simply scaling up the network with zero sectors. as after HyperDrive we raise the daily limitation to between 500 PiB to 1 EiB, but the actual growth is still about 30 PiB, this low growth might be temporary due to other facts but it is still foreseeable that it will take very long to reach the current limitation.

With that "quality over quantity" mindset I personally have right now, I would be excited to see more details of upgrade CC without resealing and more joint effort to push forward post verification proposal I add in another issue.

ozhtdong commented 3 years ago

One quick update regarding CC with data piece in terms of gas usage 1, CC with data (precommit + commit) only consume 20% to 46% gas usage compare to using deal (publish storage deal + precommit + commit) 2, Compare to zero value CC gas usage, the precommit gas usage increases 48% if has 1 piece, 71% if has 2 pieces, 136% if has 6 pieces. Commit message gas usage increase rate is between 3% to 51%.

arajasek commented 3 years ago

I find the arguments above persuasive, though I still support this proposal in principle. I find it hard to agree that the protocol should be enforcing that CC sectors should be entirely "junk", though @anorth's point about this rule supporting future innovation is well-taken.

I think with recent developments and ongoing research towards reducing the cost of deal-making, we can probably afford to wait on this proposal and see how necessary it becomes. I'm also wary of adopting anything that might increase the size of the state-tree since the recent HyperDrive upgrade could cause the state tree to grow very quickly (see, for instance, the Lotus release note here).

also, it's worth noting that if miners want to do this right now, they can by doing one sector-size unverified self-deal. it's not very expensive in gas, and it's some small payments to the Filecoin Network for using the block reward subsidy outside the network.

As a non-consensus-critical change, we should suggest implementations of Filecoin make it easier to do this. A single-command workflow for a storage provider that, given some file(s) as input, does the work of:

would be helpful. I'm not sure if any implementations currently has that ability (I know Lotus doesn't).

zixuanzh commented 3 years ago

Agreed with the arguments that we should hold off on enabling arbitrary data in CC but let me spell out the reasoning as I see it.

From the point of view of emergence, enabling arbitrary data in CC opens the door to new interactions and patterns from this "new" building block. In terms of impact on the economy, note that clients and miners can always do off-chain arrangement with or without arbitrary data in CC since deal fees can be paid out of band or be hedged with other currencies. Regardless, as long as they are using the Filecoin Protocol, some FIL tokens will be paid as Network Transaction Fee that benefits all participants. In that sense, the protocol does want more of its deal market on chain and having arbitrary data CC is suboptimal for the protocol.

Other convincing factors here are with on-chain reporting, product, and development roadmap. Having a deal market outside the main protocol undermines the visibility and transparency into the economy. On-chain deal states and deal fees alone no longer fully represent the amount of utilized storage on Filecoin which is an important metric for some existing and prospective ecosystem participants. In addition, Filecoin deals are unique with their Deal IDs and storage states that can be bridged with other smart contract ecosystems to build entirely new interactions. With all the bridges and ecosystem collaborations that the community is building, having an off-chain marketplace for deals (that is not visible to the protocol) undermines the value of the Filecoin network. The zero data in CC also became a constraint that lightweight CC upgrade can depend on.

Having arbitrary data in CC is in some sense equivalent to making a sector-size deal. With HyperDrive, the gas cost has reduced significantly. Similarly with light-weight CC upgrades, CC sectors will become more useful and CC upgrades can become a rational choice for participants.

arajasek commented 3 years ago

Given the exciting work coming out of FIP-0017, I think I'm comfortable saying this proposal shouldn't be adopted.

kaitlin-beegle commented 2 years ago

Closing this issue due to lack of interest and the introduction of FIP0017.