filecoin-project / builtin-actors

The Filecoin built-in actors
Other
81 stars 76 forks source link

Handle or migrate away from non-utf8 DealProposal Labels #69

Closed Stebalien closed 2 years ago

Stebalien commented 2 years ago

See https://github.com/filecoin-project/FIPs/issues/187 and https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0027.md.

At the moment, we reject non-utf8 labels. But we need to make a decision before we ship.

jennijuju commented 2 years ago

(whether if this is a P0 for ntwk v16 is pending analysis.

jennijuju commented 2 years ago

source

To be continued tmr.

raulk commented 2 years ago

An alternative is to just migrate labels to utf8.

I would entirely discard this option. First, from a product perspective it makes adding binary metadata tricky. Given that Go accepts non-UTF-8 strings, and it's possible that those exist in state, we don't have an easy non-UTF-8 => UTF-8 migration path. We'd have to apply a transform which would manipulate the user-provided value, and that is definitely a no-go.


I think I've read all the past discussion across all links. I have one basic question. A byte array is a looser type than a string. Any string should be representable as a byte array. I'm assuming the bulk of the migration is to re-encode this type from a CBOR string (tags 2 and 3) to a byte slice (different tag). What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

jennijuju commented 2 years ago

@raulk exactly like you said! and that way we will not need to stop the market temporarily

image
raulk commented 2 years ago

@jennijuju Can someone please describe "the workable solution"?

arajasek commented 2 years ago

@raulk

What exactly makes this migration difficult, and what's the reason to consider stopping the market temporarily?

The migration is trivial. The issue is that naively changing the Label field in the DealProposal from a string to bytes changes the serialization of the DealProposal. The current flow is as follows (some aspects of this flow aren't protocol-specific, but might as well be because they reflect the reality of dealmaking on Filecoin today):

1) Client creates a DealProposal, signs it, wraps the proposal and signature into a ClientDealProposal, and sends that to the SP 2) SP deserializes the ClientDealProposal, validates the signature, adds it to a batch, and eventually PublishStorageDeals (sending the serialized ClientDealProposal to the chain) 3) Validators of the Filecoin blockchain:

The issue is that all of the de/serializaton steps fail if they were done on opposite sides of the switch from string to bytes. When should clients start using bytes instead of strings? When should SPs start expecting clients to be using bytes instead of strings? What happens to all messages in the mempool that signed strings instead of bytes after the network upgrades?

Can someone please describe "the workable solution"?

The solution proposed by @Stebalien is to use a Union type, something like

struct Label {
  label_string string
  label_bytes []byte
}

We'd assert that exactly one of label_string and label_bytes is set, and de/serialize this object as though it were just one field -- "If label_string is set, serialize it as a string, if not, serialize it as bytes" (and the reverse for decoding). This is of course not the default CBOR encoding of such a struct, but that's fine.

Additionaly, we can start asserting that if label_string is set, that it be valid UTF-8. Other users can simply set label_bytes instead. The migration can be to copy all existing Labels into label_string iff they are valid UTF-8, and into label_bytes otherwise.

This solves most of the problems described above. A large refactor may still be needed for client implementations if they don't abstract over possible DealProposal, but at least there's no messy period over the upgrade where large swathes of PublishStorageDeals messages are failing on-chain and large numbers of clients are seeing SPs reject their deal proposals as invalid.

raulk commented 2 years ago

Thanks for the detailed proposal. This would work if the union type is entirely transparent, that is, if it's not tagged in itself, and instead we use the CBOR tag in the encoded tuple position to discriminate which type is present. Is that what's intended? Does our DAG-CBOR library support such a serde pattern? Does Rust's serde (which we need to cooperate with)?

arajasek commented 2 years ago

@Stebalien would be the one to confirm, but I'm guessing it's supported in both places, since we have such types already in the HAMT used by Filecoin (see, for example, here).

ZenGround0 commented 2 years ago

Closed by FIP 0027