bitcoindevkit / bdk

A modern, lightweight, descriptor-based wallet library written in Rust!
Other
808 stars 292 forks source link

Introduce TLV for bdk_file_store #1499

Open notmandatory opened 3 weeks ago

notmandatory commented 3 weeks ago

Sub-task for #1103

evanlinjin commented 3 weeks ago

Okay technically we don't need the T since we always append the same type. I propose the following:

Encoding

Encode each element as:

[Version: VarInt] [Length: VarInt] [Data: Bytes]

Where VarInt is defined here: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer.

VersionedElement trait

pub trait VersionedElement: Sized {
    /// Decode the `data` of given `version`.
    fn decode_with_version(version: u64, data: &[u8]) -> Result<Self, VersionedDecodeError>;

    /// The version to encode the data with.
    fn encode_version() -> u64;
}

pub enum VersionedDecodeError {
    UnknownVersion(u64),
    Bincode(bincode::Error),
}
LLFourn commented 3 weeks ago

I did a lot of thinking about this.

Backwards compat

/// Persists an append-only list of changesets (`C`) to a single file.
#[derive(Debug)]
pub struct Store<C, V = C> {
    magic_len: usize,
    db_file: File,
    marker: PhantomData<(V, C)>,
}

impl<V, C> Store<V, C>
where
    C: Into<V> + Append,
    V: Into<C> + serde::Serialize + serde::de::DeserializeOwned,

As you can see there is a new type parameter V -- this is the type that is serialized to and from the actual disk.

So in practice you'd have:

pub struct MyChangeSetV0_17_0 {... }
pub struct MyChangeSet { ... }

// serde serde etc
pub enum VersionedMyChangeSet {
    V0_17_0(MyChangeSetV0_17_0)
    Current(MyChangeSet)
}

impl From<MyChangeSet> for VersionedMyChangeSet {
     fn from(value: VersionedMyChangeSet) -> MyChangeSet {
         // do the obvious
     }
}

Then you'd set V as VersionedMyChangeSet and C as MyChangeSet.

If we do have forwards compatibility we need a length prefix. This can be done by just encoding each entry into a Vec<u8> and bincode encoding the result to the file.

This makes backwards compat optional and straight forward to achieve. You just have to nudge users to create this enum. For wallet we can create VersionedChangeSet and do all this without depending on bdk_file_persist I'm sure it's likely to be useful in the future anyway.

When making new changesets for existing types (which for the record I actually think we can avoid) we keep the old one around of course and put ChangeSetV0_WHAT_EVER at the end. bdk_wallet will probably use this in turn to develop a new changeset when it goes to v2.

Forwards Compat

To do forwards compat we just write the length of each entry before we serialize it. This can be done in a couple of ways. e.g. https://docs.rs/bincode/latest/bincode/config/trait.Options.html#method.serialized_size or by just serialzing to a Vec<u8> and serializing that.

Now that we have the length, when we decode an entry to a Vec<u8>, we try and decode as V. If it fails we strip the enum varint (tell bincode to just read an int), and then just try to decode as C, if that fails we finally throw an error.

So this gives you forwards compat as long as whatever the future V is that has been written encodes first a varint and then something that C can partially decode. This would be the case if the new version has just added an extra optional field.

For what it's worth, I discussed with Evan about things we had as breaking changes coming up or speculated and none of them actually were compatible with forwards compat (none of them would be just adding an optional field).

evanlinjin commented 3 weeks ago

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

LLFourn commented 2 weeks ago

I later realized the "Forwards Compat" bit is not totally sound. Before trying to do a decode on a higher versioned entry you need to actually check it is higher versioned not just assume so because it failed. It could be a corrupted lower version entry or something else has gone wrong. This means we'd need to know the current version.

It's actually kinda possible to do this automatically with bincode if you require Default for V and just encode the Default and see what int it encodes as to get the implicit version number. Since we already enforce Default for Append, then V should do it too. A little hacky but it does give us the feature. I'd be tempted to not actually implement this just yet but leave forwards compat later. I fully realize that "let's do forwards compat later" sounds like the famous last words.

nymius commented 2 weeks ago

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

I would like to give it a try.

notmandatory commented 2 weeks ago

@nymius I've assigned this to you, thanks for jumping in to take this on.

notmandatory commented 2 weeks ago

Can we push this TLV work for the bdk_file_store out of the 1.0.0-alpha milestone?

This would let the team focus on #1496 and #1498 changes and we'd only deliver backward compatibility for the beta with bdk_sqlite. The examples would also have to be changed to use bdk_sqlite but that should be faster/easier to review than redesigning bdk_file_store.

notmandatory commented 2 weeks ago

I discussed this with @evanlinjin and decided this issue can be moved to the 1.0.0-beta milestone. We just need to warn beta users that if they want backward data file compatibility they'll need to use the bdk_sql until this issue is done.