Improve the CoinSelectionAlgorithm interface

afilini commented 4 years ago

The trait is currently defined as:

fn coin_select(
    &self,
    utxos: Vec<UTXO>,
    use_all_utxos: bool,
    fee_rate: FeeRate,
    amount_needed: u64,
    input_witness_weight: usize,
    fee_amount: f32
) -> Result<CoinSelectionResult, Error>

The first thing I would add is a reference to the database, so that the algorithm can lookup additional details about the utxos, such as their age (see #120), if it needs to.

It would also make sense to give the algorithm two "input witness weights" for the two "external" and "internal" descriptors. We currently use the max of the two as a worst-case scenario, but that's inefficient. The UTXO struct already contains a field to tell whether a utxo is internal or not.

The use_all_utxos seems pretty dumb, in that case we are basically using the coin selection algorithm to compute the fee cost of using all the utxos. Also it's not particularly useful by itself, because in some cases (e.g. RBF) we don't want to use all of them but we also don't want to let the coin selection pick whatever utxos it wants. We could change this to a list of "mandatory utxos" to use, which in the case of use_all_utxos would just contain all the utxos, but in case of RBF it would contain the inputs that the transaction is already spending, an then the coin selection can select more inputs for the additional fees that need to be covered.

We could also consider the option of not giving a list of utxos at all and just letting the algorithm extract them from the database: that would definitely be useful on larger wallets where the list could become huge to maintain in memory, but it has the disadvantage that we would not be able to do filtering outside (like excluding internal/external utxos, blacklisting utxos, etc). I think that would be an interesting topic to research but I would leave that for a future more "performance-focused" upgrade.

Overall, for this initial update, I would try to make it look like:

fn coin_select<D: Database>(
    &self,
    database: &D,
    utxos: Vec<UTXO>,
    mandatory_utxos: Vec<UTXO>,
    fee_rate: FeeRate,
    amount_needed: u64,
    input_witness_internal_weight: usize,
    input_witness_external_weight: usize,
    fee_amount: f32
) -> Result<CoinSelectionResult, Error>

afilini commented 4 years ago

Alternatively, we could also try to define the Database generic on the trait itself:

pub trait CoinSelectionAlgorithm<D: Database>: Debug {
    fn coin_select(
        &self,
        database: &D,
        utxos: Vec<UTXO>,
        mandatory_utxos: Vec<UTXO>,
        fee_rate: FeeRate,
        amount_needed: u64,
        input_witness_internal_weight: usize,
        input_witness_external_weight: usize,
        fee_amount: f32
    ) -> Result<CoinSelectionResult, Error>
}

This would make it possible to only implement the algorithm for a specific type of database, which could be useful for advanced users that have customized databases that support complex queries.

murchandamus commented 4 years ago

Seems like a pretty big change for a single issue, but if you think it could be done the trait you propose looks like a good improvement to me:

I'd agree on the use_all_utxos, I was wondering what that would be used for.
It's not clear to me why we'd need both input_witness_internal_weight and input_witness_external_weight. Since these are all our own addresses, I would assume they are the same size regardless; unless BDK supports multiple different address types already. Either way, I would suggest that the input script size should rather be a property of a UTXO than of the coin selection directly.
On the other hand, the output sizes can differ: they may range from 31 – 43 bytes (if we only look at P2(W)PKH and P2(W)SH). We would need to know what the size of a change output is if we create one, since we would only find out during the coin selection.

afilini commented 4 years ago

Well, yeah it is a pretty big change but I see this as a general improvement that we should do anyway, it's not done specifically for that issue.

BDK takes two completely independent descriptors, one for the "external" set of addresses and one for the "internal" one. I would expect most people to use the same structure in both descriptors (with some changes to the keys' derivation paths), but that's not guaranteed, so potentially we could have wallets with very different witness cost between external and internal addresses, and we should take both into account. The size is currently not part of the UTXO struct just because that structure is saved directly in the database, and it's easier to just extract it and directly use it, compared to having to iterate through all items and setting the right cost based on is_internal.

The place where we handle the change is currently outside of the coin selection algorithm: basically we ask the coin selection to get as close as possible to the amount we need, and then afterwards we decide whether we should add the change output or just send everything to fees. Do you think that should be part of the coin selection itself?

https://github.com/bitcoindevkit/bdk/blob/56bcbc4affd5eae4b442b10eb391a310ef58d3b8/src/wallet/mod.rs#L401-L423

LLFourn commented 4 years ago

I am currently having a go at: https://github.com/bitcoindevkit/bdk/issues/114. I just wanted to point out that:

It's not clear to me why we'd need both input_witness_internal_weight and input_witness_external_weight. Since these are all our own addresses,

Note this is not the case if the enhancement in #114 done. To do weight calculation properly in that case you would need the descriptor for each potential input (or to precompute the witness size for it and attach it to the UTXO). This also means that not every UTXO passed in could be looked up in the database. Perhaps they would have to be in a separate list.

We could change this to a list of "mandatory utxos" to use, which in the case of use_all_utxos would just contain all the utxos, but in case of RBF it would contain the inputs that the transaction is already spending, an then the coin selection can select more inputs for the additional fees that need to be covered.

I am not sure what you meant exactly here but since they're mandatory I think these should be pre-computed and the aggregates passed in so the coin selection can finish the job even in the usual case (not RBF).

afilini commented 4 years ago

Note this is not the case if the enhancement in #114 done. To do weight calculation properly in that case you would need the descriptor for each potential input (or to precompute the witness size for it and attach it to the UTXO). This also means that not every UTXO passed in could be looked up in the database. Perhaps they would have to be in a separate list.

That's true, I wasn't considering this when I suggested to just take two fixed weights for external/internal. I think at this point the best thing is probably to add the weight as a property to UTXO as suggested by @Xekyo, even though it feels a bit redundant to store it in the database for the wallet's utxos, because we also have is_internal that we can use to compute the weight.

I am not sure what you meant exactly here but since they're mandatory I think these should be pre-computed and the aggregates passed in so the coin selection can finish the job even in the usual case (not RBF).

The reason why they are not pre-computed right now is just to re-use the fee computation code that is part of the coin selection: the idea is that the coin selection must know how to compute the extra fees for a given input, so if we want to always include an input instead of doing it outside we just ask the coin selection to do that computation for us. It's basically making the coin selection a bit more generalized to avoid duplicating some code.

There's also the argument that if the coin selection needs to do something fancy it's good that it knows all the inputs that are being spent in a transaction, so that it can properly select among the available ones.

For instance, imagine a "privacy-focused" coin selection that only spends together inputs if they come from the same address, and refuses to do so from different addresses, if you haven't coin-joined first or something like that. In that case the coin selection could inspect the mandatory_utxo list and, first of all, return an error if there are inputs from different addresses in the list, and otherwise only pick extra inputs that come from the right address.

afilini commented 4 years ago

Although, now that I think about it a bit more, I think we could make the utxos and mandatory_utxos lists become of type (UTXO, usize) so that we don't actually have to store the weight multiple times for each of the wallet's utxos, but we can also supply a custom weight for extra utxos added manually, without having to add an extra list for those utxos.

In the end it would look something like:

pub trait CoinSelectionAlgorithm<D: Database>: Debug {
    fn coin_select(
        &self,
        database: &D,
        utxos: Vec<(UTXO, usize)>,
        mandatory_utxos: Vec<(UTXO, usize)>,
        fee_rate: FeeRate,
        amount_needed: u64,
        fee_amount: f32
    ) -> Result<CoinSelectionResult, Error>
}

murchandamus commented 4 years ago

The place where we handle the change is currently outside of the coin selection algorithm: basically we ask the coin selection to get as close as possible to the amount we need, and then afterwards we decide whether we should add the change output or just send everything to fees. Do you think that should be part of the coin selection itself?

It doesn't have to be. Getting as close as possible could cause you to drop the excess to fees more often than necessary. If you have the choice between two inputs one would create dust (that then gets folded into the fee) and one creates a decent change output, the latter may be preferable.

murchandamus commented 4 years ago

because we also have is_internal that we can use to compute the weight.

Imagine Taproot getting activated and the wallet switching to P2TR outputs, but still having some P2WPKH outputs floating around. It doesn't have to be stored in the UTXO, if is_internal can resolve the input size of different types correctly. I just expect that a wallet may eventually have UTXOs of more than one output format. :)

afilini commented 4 years ago

It doesn't have to be. Getting as close as possible could cause you to drop the excess to fees more often than necessary. If you have the choice between two inputs one would create dust (that then gets folded into the fee) and one creates a decent change output, the latter may be preferable.

Yes, that's a good point. This didn't really matter for my LargestFirstCoinSelection because it was just stopping as soon as the required amount was covered no matter what, but more advanced coin selections could take this into account and optimize for it.

I just expect that a wallet may eventually have UTXOs of more than one output format.

There's been quite a bit of discussion regarding this, both here on GitHub and on Discord. I think it's definitely something we should support, but I'm still trying to figure out the best way to do it. Just adding more descriptors to a wallet doesn't seem like the best option to me, for a few reasons:

Currently when the wallet starts we validate the checksum of the external and, if present, internal descriptor with the ones stored in the database, to avoid accidentally using a database populated with a different descriptor. If you can add more descriptors at runtime this would potentially become a bit more complicated: do we always require to specify all the descriptors that have touched the database, even if the user doesn't care about them anymore? Can you only specify a subset of them? If so, then we would have to filter the database when we do things like "list_utxo" to hide the utxos that belong to other descriptors. Or we skip the filtering, but then it means that you could create a transaction that spends something from a descriptor that you specified in the past, and now you can't sign that transaction.
Also, if you only want to import a descriptor temporarily (say, for sweeping a private key), you don't want to pollute your local database permanently with stuff that you don't really care about.
We would have to add a way to apply filters to calls like get_balance, list_unspent, etc because the user might want to know the balance of a specific descriptor instead of the whole wallet.
Same thing for get_new_address, you would have to choose from which descriptor you want to generate an address.
In general this would make the library a bit harder to use for users that just want to make a simple wallet.

For all those reasons I was proposing a different route, which would basically be to add a "container of wallets" structure that has that complexity inside, but that keeps the Wallet very simple as it is now. For instance, the logic to coordinate a migration to a new descriptor would be inside this, you could make a container with two wallets inside (old and new) and the container would handle everything instead of the single Wallet. The initial discussion was at #57 if you are interested in that.

bitcoindevkit / bdk

Improve the CoinSelectionAlgorithm interface #121