filecoin-project / specs

The Filecoin protocol specification
https://spec.filecoin.io
Other
367 stars 171 forks source link

Sector set storage #116

Open whyrusleeping opened 5 years ago

whyrusleeping commented 5 years ago

We need a data structure to hold sectors on chain, inside the miner actor. Currently, we just have them all in a big map, which won’t scale well (every change to it creates a new object, meaning we would rewrite the entire thing with every change).

This issue is a place to brainstorm solutions.

Requirements

Nice to haves

Does the HAMT work?

We could potentially use the HAMT code, but we would need to adapt it to have efficient index lookups. To do this, we could annotate links to children with a count of how many values exist in and under that child, allowing us to know at each node which child the thing we are looking up is in.

whyrusleeping commented 5 years ago

One question here is what data do we actually need to store? We currently store both commR and commD, but really only commR needs to go on chain technically. The reason commD being on chain is nice is that it makes client payments simpler, they don't have to wait for the miner to seal their data before they can give a properly guarded payment voucher for the deal.

laser commented 5 years ago

The reason commD being on chain is nice is that it makes client payments simpler

A storage client will need CommD if they are to verify a piece inclusion proof provided by a storage miner claiming to have stored their piece. If CommD doesn't appear on chain, the storage client can get it by querying the miner for deal state. Can the storage client trust the CommD returned from the storage miner?

whyrusleeping commented 5 years ago

The client can trust the commD as long as the miner also gives the client the porep proof and the commR (and other inputs)

On Fri, Jan 25, 2019, 2:47 PM Erin Swenson-Healey notifications@github.com wrote:

The reason commD being on chain is nice is that it makes client payments simpler

A storage client will need CommD if they are to verify a piece inclusion proof provided by a storage miner claiming to have stored their piece. If CommD doesn't appear on chain, the storage client can get it by querying the miner for deal state. Can the storage client trust the CommD returned from the storage miner?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filecoin-project/specs/issues/116#issuecomment-457760181, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL4HAbRttHmYOPwAta0ehbqTF6UVu-Yks5vG4mLgaJpZM4aS8dp .

whyrusleeping commented 5 years ago

Thinking through this a bit, the HAMT won’t work, as it won’t put contiguous sectors next to each other in the index space. We need a data structure that puts sectors added at the same time next to each other, to make bitmasking over them in failure modes easier. If we used the hamt, and a miner lost a disk, they would have to select random bits across the entire set, because the HAMT would insert the sectors based on their hash.

So we need to build a new data structure... yay.

porcuquine commented 5 years ago

Jumping into the middle and addressing the comments. There is another reason to put commD on chain: third-party verification of data. The use case is that someone wants to verify that a given piece of content is stored in a certain number of sectors. (This is a real use case we are actively working toward supporting.) If commD is on chain, they can compute commD for the data themselves, then (somehow) query the chain for matching commRs and verify that they represent live sectors.

If commD is not on chain, the third party won't know which proofs to verify (passing commR and the known commD).

Come to think of it, how will anyone other than the original client verify those proofs without knowing commD?

whyrusleeping commented 5 years ago

@porcuquine Yeah, that’s a good point. It does add overhead to the chain, but it might be worth it.

My thinking for proof verification is that the commD is submitted with the initial CommitSector call, used for verification, but not stored beyond that. I think having the commD around also might make different potential payment schemes simpler, so for now, let’s move forward assuming that we will keep the commD as well as the commR.

——

Separately, thinking more on whether or not we can use the HAMT. It’s worth considering how we want the sector IDs to be assigned. We could assign them during each new commitsector call, monotonically, or we could even just say “the miner gets to pick the ID for every sector”. This makes it even easier for miners to ensure that sectors likely to get lost at the same time have contiguous IDs.

If we do that, then it becomes feasible again to use the HAMT (though in a less efficient way than I was thinking before). We could simply have the ‘key’ be the sector ID, and have the value be the commR and commD (and maybe size? for multiple sector sizes). (This scheme could also work in a situation where the miner doesn’t choose the sector IDs, but the miner choosing the sector IDs requires something like this).

porcuquine commented 5 years ago
  1. I see — my point was only that commD needs to be on chain somewhere. As long as that's the case, I assume (perhaps wrongly) that a query mechanism could track down the relationship to the commR (but I may not be fully visualizing the chain of events).

  2. The problem with miner picks is that it might allow grinding the replicaID. We would need to make sure the protocol prevented that somehow. Otherwise, I think it would open a class of attacks on PoRep.

  3. I do agree that contiguous IDs of lost sectors would be virtuous. Can we arrange that in some other way?

  4. I haven't had time to think about this enough, but here's a starter idea for you to think about and/or comment on. Could we use a persistent vector, with indexes being sector ids? That would allow us to look up by index as before. This would mean we would need to avoid/penalize sparseness. But might this also be part of a strategy to encourage contiguous sectors/ids? Here's a reference for persistent vectors in Clojure: https://hypirion.com/musings/understanding-persistent-vector-pt-1 (The cited work there is about HAMT — but I believe this is not identical. I need to dig into the details of HAMT per se more in order to follow this discussion better.)

whyrusleeping commented 5 years ago

@porcuquine hrm... is the miner picking the replica ID going to be a problem? What is the attack there, I'd like to dig into it, because even the 'current' scheme allows some miner selection over the ID (though not 'full choice').

Persistent vectors are really cool. Reading through that doc, it looks like a really good structure to use here, more suited to the task than the HAMT (I was trying to avoid having to write a new merkle-datastructure, but maybe its inevitable).

porcuquine commented 5 years ago

I can't quantify how problematic it is, but bear in mind that we don't know the real-world attacks on ZigZag yet. We need to place bounties and incent people to try to break it.

That said, here is the shape of my concern. Remember that sector ID is one component which is hashed to the replica ID. That means if I can freely choose sector ID, I can grind until I get an ID I like. In such a world, I can essentially perform proof of work, searching for a replica ID less than some number (i.e. with leading zeroes). This allows me to reduce the entropy provided by replica ID.

Taken to its logical extreme (admittedly implausible in practice), I could always select the same replica ID — regardless of the chain randomness contributed, just by treating sector ID as a random nonce. If I have the same replica ID for every sector, then I can reuse the same physical replica for multiple sectors.

Now assume I cannot freely choose the replica ID, but I can instead reduce it to a single bit. This still means I can share replicas. It just means I also have to have at least 2 real replicas. We can continue with this logic, expanding the number of real replicas I am willing to store. To make it concrete: if I am storing GiB sectors and I have a TiB of storage, then I will have 2^10 physical replicas. Therefore, if I could reduce the replica ID space to 10 bits, I could use this attack freely. If I am storing 256MiB sectors and have a PiB of storage, then I only need to reduce to 22 bits (which I realize is still vastly impractical).

Think about it differently, and you see that I don't need to have followed this strategy from the beginning. In fact, this attack is always available to me. I can always grind as much as I like in the hope of finding a collision with one of my existing replicas. The more replicas I have, the better my chance of winning this lottery.


On another front, generation attacks. Assume there exists a method for generating fake challenge responses which will actually succeed sometimes. Is this true? I don't know, but I don't think it has been proved that it cannot exist. Assuming this strategy exploits some property of the actual hash and encoding functions we use, it may only work with certain replica IDs. The likelihood of such an algorithm is certainly higher than that there exists one which works for all replica IDs. Therefore, if I know of such an algorithm, I can also (completely in addition to the previous attack) grind for replica IDs satisfying that constraint as well.

Admittedly, I do not have a candidate for this attack. But I think it would be worth thinking through over time.

Hopefully that makes clear why I'm more concerned about free choice of sector ID, than I am about limited ability to influence the ID. As long as I'm constrained to choose from a reasonably bounded set, then the amount I can grind is limited. If I have full control, then I can grind to my heart's content; and it does seem that at some scale (i.e. enough existing replicas) then it would occasionally pay off.


This might be an incentive for malicious (or even rational) mining pools or an ad-hoc market. Imagine a future in which Filecoin takes over, and there really are an enormous number of replicas out there. In that world, proof of space-time can be enhanced by proof of work for rational miners open to collusion.

Imagine I don't have any storage space at all, but I have an ASIC that can grind replica IDs. I do this, searching for collisions with any stored sector's replica ID. When it comes time for me to generate a PoSt, I am challenged for a random node in a sector owned by someone. I either pay them to ship me the entire sector, or better yet just to provide the single inclusion proof for the challenged node.

With enough hashing power thrown at the problem (in our imaginary future, where physical media are exhausted, but the demand for Filecoin is infinite) — this creates a natural secondary market for partial PoSts. As a storage miner, this is already my only product. Why not get paid for it directly?

In a sense, maybe this is completely fine. It dilutes a storage miner's chance of wining block reward, but it increases his chance of receiving a serendipitous PoSt-on-demand windfall. So maybe this secondary proof-of-work market on top of Filecoin would actually simply be a way of recruiting surplus POW resources into the Filecoin network. These windfall rewards would still be tied to total storage, since presumably the price for larger sectors would be proportionally larger. At the same time, since the price would not be mandated — perhaps this could actually end up acting as a positive market force providing some extra arbitrage/liquidity. Maybe this is a cryptoeconomics question.

As I think about it, it's not 100% clear that this would be bad. The premise is straightforwardly problematic on the surface in several ways, but it has interesting implications worth thinking about. My point is just that we should think about it, since how we choose sector ID has implications.

porcuquine commented 5 years ago

Regarding implementing persistent vectors: I think it would be a lot of fun and probably be a valuable primitive to have available in many places even beyond Filecoin internal needs. I'd theoretically be happy to help, though would need to prioritize against other projects.

porcuquine commented 5 years ago

Thinking about it more, I think there is a more concrete attack. When grinding for desirable replica ids, a miner doesn't even have to target only existing replicas. Any discoverable collision can then be used to create false replicas. Because of the birthday paradox, this is a much easier attack (reducing to 128 bits of security, I believe).

[EDIT: Having thought this through, I now think it's fine because an actual attack would amount to breaking our chosen hash function. Much of the rest of my thinking out loud is probably invalidated by that. It may still have some value for thinking through why we need a collision-free hash function to derive the replica id.]

A large pool of rational miners could generate and publish potential replica ids as fast as possible. If a match is found, the two parties agree to share a true replica — with one storing the data, providing proofs for both as needed, and presumably receiving a market-determined commission. This would be detectable, since it's highly improbable for identical replica ids to appear at the same time. However, detection could be mitigated by withholding the duplicate for as long as the protocol allows (before it would become invalid because of chain randomness). It's not entirely clear what one could do about detection, though. If they follow the protocol and provide genuine proofs, then it would be indistinguishable from honest mining.

porcuquine commented 5 years ago

Given the resources expended for POW, it seems entirely plausible that such a network could regularly find collisions (if my analysis is correct). The true value of each collision might actually be very high — since, in the case of as-yet-unreplicated sectors, discovered collisions could be used to replicate the largest available sectors. Therefore, the value of a collision would be limited only by that size — and the network otherwise benefits (because of efficiency) from having that size be as large as possible.

porcuquine commented 5 years ago

As a matter of practical efficiency and minimization of ongoing trust between colluders, replica id 'miners' could use fresh identities. When a collision is found, one party could transfer the identity to the other for a one-time price. The receiver would then possess a duplicate sector and could produce his own PoSts with no overhead.

porcuquine commented 5 years ago

So my tentative conclusion is [incorrect] ~that free choice of sector id would give rise to a secondary sybil market, in which each replica-sybil's value is approximately equal to that of the largest mineable sector~.

porcuquine commented 5 years ago

All that said, it seems that the same technique could be used even without free choice of sector id — by creation of new miners with distinct identity for the sole purpose of cranking out new replica ids. This is a ~good~ argument for somehow limiting creation of miners (either time limiting or imposing a cost — I can think of ways to do that).

porcuquine commented 5 years ago

Alternately, we could make procurement of replica ids an expensive operation — either in time/resources, or just by actually charging for them and require they be acquired from the chain. This is not necessarily so unreasonable, since there's no honest reason to stockpile them. A small up-front investment prior to sealing is not more burdensome than collateral.

porcuquine commented 5 years ago

We could even treat it as a deposit, refunded when you provide the corresponding proof. That (the up-front payment) would entirely prevent this whole category of attacks, as long as the price was set appropriately.

porcuquine commented 5 years ago

See EDIT above. Although my train of thought here was wrong, it is still true that choice of sector id allows grinding on the PoRep as a whole — so any (unknown) attacks depending on how replica id propagates through the replication might still be facilitated. If we are at all concerned about this, the last strategy of allocating IDs from the chain for a deposit might still be worth considering.

nicola commented 5 years ago

I did not read all the @porcuquine comments. The replicaID must contain randomness from the blockchain, the miner already puts some randomness in (currently public key, but could also be minerId), ideally they shouldn't choose or they could be able to grind storage (it will be expensive!).

Replica ID should just be the miner identity concatenated with the block they are mining on.

whyrusleeping commented 5 years ago

@nicola minerID || blockhash doesn't work, otherwise they can only make one seal per block

On Thu, Feb 14, 2019, 6:22 AM Nicola notifications@github.com wrote:

I did not read all the @porcuquine https://github.com/porcuquine comments. The replicaID must contain randomness from the blockchain, the miner already puts some randomness in (currently public key, but could also be minerId), ideally they shouldn't choose or they could be able to grind storage (it will be expensive!).

Replica ID should just be the miner identity concatenated with the block they are mining on.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/filecoin-project/specs/issues/116#issuecomment-463691331, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL4HNjj6X-jBnQIqKWMAynLiCDtesGpks5vNY0vgaJpZM4aS8dp .

porcuquine commented 5 years ago

FWIW, replicaID currently includes the sectorID.

whyrusleeping commented 5 years ago

@porcuquine I just read your comments, youre basically saying "if miners can find hash function collisions, they can pull off an attack". To which I respond: If you can generate collisions, the hash function is broken.

If we are assuming that the hash function will not be broken anyways, then i see no problem in allowing full miner selection over the ID.

In addition, I believe that it is a hard requirement that miners have full selection over the sector IDs (I could bring this down to full selection within some range) in order for the protocol to work scalably.

porcuquine commented 5 years ago

@whyrusleeping I actually agree with you and concluded the same in my edit.

This doesn't apply to the risks of grinding to generate a fake Seal proof. Although this doesn't add to the risk during a generation attack, allowing an infinite sampling of replica IDs does (I believe) reduce attempts to generate a single fake PoRep for a fixed commD to being as easy (or hard) as for any other generation attack.

While this doesn't affect security of the power table, it does reduce the security of storage market redundancy to 'only' what is also provided against generation attacks. Of course, we assume this is sufficient, but I do think this is a reduction — so we should at least be clear about it.

whyrusleeping commented 5 years ago

Additional interesting design constraint to consider: Do we ever allow sector ID reuse? If we do not, then we have to track which IDs have been used, in order to disallow their use in the future.

If we do allow for ID reuse, then I think a fairly straightforward potential solution would be to use the persistent vector, but modify it to allow for sparse regions. Such a data structure should satisfy all the design constraints. I'll look into this more soon.

whyrusleeping commented 5 years ago

I started to implement a sparse persistent vector here, and realized that many pieces are very similar to the HAMT. Particularly, when making it sparse, you end up wanting to use the same bitfield compression prefix count optimizations that you use for HAMTs. so now i'm thinking, maybe we use the HAMT still, but swap out the functionality that turns a key into an index at each height.

whyrusleeping commented 5 years ago

Some notes i sketched on this idea: data structures -5 data structures -6

whyrusleeping commented 5 years ago

@porcuquine thoughts?

anorth commented 5 years ago

For context, could you please clarify whether this is a data structure that will appear literally in the on-chain bytes (like the encoding of messages), or is it a private implementation detail on which different nodes need not necessarily agree. In the latter case, the issue might belong in an implementation repo rather than specs.

nicola commented 5 years ago

note that the SNARK must be able to signal what are the failing sectors, so either this is snark friendly or the verifier will need to compute the inputs to verify the snark later on (in other words: either "compressing" the sectors is cheap or "decompressing" the sector must be cheap and done out of the snark)

porcuquine commented 5 years ago

@whyrusleeping After a few days of kicking this around, I don't see any big problems with your AMT and haven't come up with any incremental improvements. Either might come as more time passes, but in accordance with what we discussed earlier, you can take this as a first vote of confidence toward solidifying this design.

whyrusleeping commented 5 years ago

@anorth the first sentence of the issue is: "We need a data structure to hold sectors on chain, inside the miner actor."

Should I clarify in some other way?

whyrusleeping commented 5 years ago

The question of sector ID reuse has come up again in the context of payments. Payments to miners need to be contingent on a given sector, this can be via a reference to a sector ID, but if miners can reuse sector IDs, then they could drop the data the payment is for and then add new data with the same sector ID.

This sucks, but preventing sector ID reuse while maintaining the miners free choice over sector IDs to use is very difficult.

An alternative structure for the payments that could work here is to reference the sector by its commD hash. This isnt easy to look up, as it requires a scan of the sector set to confirm its there, but we can short circuit that by having the miner provide the sector ID that has the given commD, the actor code can then verify that it matches.

tldr; I think sector ID reuse is fine, we just have to be aware of it

whyrusleeping commented 5 years ago

Note: depends on PoSt faults requirements

whyrusleeping commented 5 years ago

Note for @nicola :

Check whether we need to implement faults in PoSt, or if we can get away with having miners who encounter faults post their proof on chain immediately.

dignifiedquire commented 5 years ago

Check whether we need to implement faults in PoSt, or if we can get away with having miners who encounter faults post their proof on chain immediately.

This would still need some form of implementation in PoSt, as an "aborted PoSt" still needs different proof verification, compared to a completed one. Or am I missing something here?

whyrusleeping commented 5 years ago

@dignifiedquire I think this would require arbitrary length PoSts

dignifiedquire commented 5 years ago

@whyrusleeping some questions that came up when trying to actually implement the amt

  1. Should we do the CHAMP optimization of splitting things into two different arrays (one for references, one for values)
    • this depends on how important in order traversal is, as it gets very inefficient trying to do in order traversal when doing the regular version
    • as far as I can tell this optimization is not part of go-hamt-ipld
  2. Should we do the bitset optimizations to avoid storing very large arrays?
  3. What base encoding should we use to represent the keys/indexes when translating them?

Efficient lookup by key and index

What is the difference between key and index for you? I was assuming we use the index as key.

whyrusleeping commented 5 years ago

Should we do the CHAMP optimization of splitting things into two different arrays (one for references, one for values)

Either way really, I think if we use a CBOR map as the top level thing, then it doesnt cost us anything extra to have one array for each (empty arrays in maps are free because you just leave the field out).

I don't see how it affects order traversal, mind elaborating?

Should we do the bitset optimizations to avoid storing very large arrays?

That seems to make sense to me, but it depends on the expected case. If we're going to have mostly full nodes, then its just overhead. So for now, maybe we don't do that optimization?

What base encoding should we use to represent the keys/indexes when translating them?

Translating? Inside the data structure, the keys should just be bytes. and yeah, we don't really have keys, just indexes. Which means we don't even need to store them, as you can infer an index by the nodes location within the structure.

dignifiedquire commented 5 years ago

I don't see how it affects order traversal, mind elaborating?

I am referring to the first few paragraphs here: https://blog.acolyer.org/2015/11/27/hamt/ under "A new champ".

dignifiedquire commented 5 years ago

and yeah, we don't really have keys, just indexes.

👌

Which means we don't even need to store them, as you can infer an index by the nodes location within the structure.

👌 that was what I thought as well

dignifiedquire commented 5 years ago

Translating? Inside the data structure, the keys should just be bytes.

The issue that I have is that your diagram above makes it look like we try to get things into the tree such that a depth first traversal would lead to an in order traversal of the nodes (using base10). This is not the case if we use a base2 representation, as far as I can tell.

dignifiedquire commented 5 years ago

Rereading your diagram, the ordering is definitely not there, even with base10, so ignore that part.

whyrusleeping commented 5 years ago

@dignifiedquire you should be able to have trivial in-order traversal of the indexes, thats most of the point. Unless i'm missing something.

The bucketing algorithm should be roughly, at each layer, child = index / width and then when you recurse (if you recurse) your index within the child is index_in_child = index % width.

ZenGround0 commented 5 years ago

Hey @whyrusleeping @dignifiedquire I have some basic questions. Bear with me while I get up to speed here.

Is the proposal to use the AMT datastructure from "Fast And Space Efficient Trie Searches" with the keys being the sectorID indexes? Are you proposing anything different when "extending the range" than an AMT would do? If so it's not obvious from the picture and I would love some clarification. I would also appreciate a definition of the "stacking process" if it differs from a canonical AMT.

Also, is this inspired by section 5 of the HAMT paper "SORTED ORDER AMT"? If not it seems like there are a few promising ideas closely related to our objective that might be worth thinking over.

dignifiedquire commented 5 years ago
dignifiedquire commented 5 years ago

AMT the name comes from "HAMT" -> no hashing, leaves us with "AMT"

ZenGround0 commented 5 years ago

"stacking process" is just referencing the phrase "to insert any larger value repeat the stacking process" in the diagram above.

@whyrusleeping after inserting both 0 and 2 why doesn't the diagram above have many intermediate tables like this: Screen Shot 2019-06-04 at 7 17 21 PM

It seems like we need to traverse the trie starting with the most significant symbols of the index and working down to the least significant symbols of the index to get the correct ordering. But in that case wouldn't 0 and 2 have O(log(max sector set size)) collisions during insertion?

This is coming from my understanding of AMTs as described in HAMT/CHAMP papers, does this datastructure's insertion method differ from those papers?

--edit--

Note to readers from the future, I was imagining the datastructure described is the AMT from this paper, but it is unrelated.

whyrusleeping commented 5 years ago

@ZenGround0 thats part of the point here, the structure is only deep as the log of its biggest element. It expands as you insert increasingly larger elements into it (and would shrink if you removed them)

ZenGround0 commented 5 years ago

Ok I'm closing in. My remaining questions are: 1) What node width do we want? go-ipld-hamt currently has an implied width of 256 pointers per node. Should we go with that here too? 2) As an optimization go-ipld-hamt uses linear probing over W=3 nodes by comparing the search key against the stored key in the KVs array. I think we cannot do this because we want to avoid storing the key. Is this right? 3) Furthermore I don't think we can even do the optimization in why's diagram where we only allocate a new node for 33 after 37 comes along. I think we need to allocate a new node when we first see 33. Again this is because we are planning to not store the key. Is this right? 4) Above @dignifiedquire asks "Should we do the bitset optimizations to avoid storing very large arrays?" and @whyrusleeping responds "So for now, maybe we don't do that optimization?". I am taking that to mean that we allocate the entire node width (say 256 Pointers) and not include a bitfield in the node at all. Does this sound good?

whyrusleeping commented 5 years ago

I implemented an AMT here: https://github.com/filecoin-project/go-amt-ipld