GaloyMoney / bria

Mozilla Public License 2.0
28 stars 8 forks source link

Use Payjoin to Improve Batching #266

Open DanGould opened 1 year ago

DanGould commented 1 year ago

Thanks @nicolasburtey for reaching out to express interest in Payjoin.

As far as I can tell, Bria is infrastructure to batch Galoy liquidity into outputs to many destinations. Bria may take advantage of both sending and receiving Payjoins to improve its fee efficiency and save sats, velocity of txos in high fee environments, and general capacity to batch.

Edit: See Interactive Payment Batching is Better for a step by step explainer of Payjoin as a tool for batching.

Bria could send a payjoin and include a batch of payouts as the Payjoin's change, combining Payjoin receiver sats with their own preserving privacy compared to typical batching, even transaction counterparties would not be able to know which outputs are Bria's vs which are being forwarded as part of a batch.

Bria could receive Payjoin to improve batching efficiency further. When a Payjoin request is incoming, the payjoin server could choose to batch queued payouts to be funded by the incoming request's inbound sats, share transaction fees, take the opportunity to consolidate, and never take custody of the inbound sats nor spend Bria's utxo at rest if inbound sats cover outbound flows. The resulting payjoin ought to be steganographic, looking like any typical consolidation or batch while actually being a payjoin, preserving Bria's privacy and the privacy of the network as a whole.

What do you think of these batching upgrades @nicolasburtey?

Bria looks to be in the lead as far as Bitcoin batching is concerned and Payjoin may enable an order of magnitude better batching thanks to transaction cut-through it makes possible.

bodymindarts commented 1 year ago

To me there are a few things that are unclear as to how this could work:

DanGould commented 1 year ago
  • bria is intended to be internal infrastructure. I'm guessing including support for payjoin would necessitate exposing a server publicly. This increases security risks.
  • currently payouts are attached to queues that spawn batches. I'm not sure how payjoin would be integrated into this domain model. I guess there could be a queue configured to be triggered based some kind of payjoin criteria.

Help me understand what you mean by internal. Does bria process only manual payouts, or are they automated? How does bria receive funds? A payjoin receiver would expose a public endpoint to receive PSBT proposals and in my estimation pre-empt a batch of payouts by replacing it with a Payjoin PSBT with the same transfer outcome but one fewer transaction. I'm curious what security risks you see in comparison to automated batch construction Bria does currently.

  • How this could work exactly would probably have to be determined during implementation. How would payjoin support impact reliability? If there is some kind of external facing interaction this may add unintended failure modes

The BIP 78 standard defines the Payjoin V1 protocol (Unified Bitcoin URI request, PSBT Interaction, data transport, and error handling). It has been used in production since 2018. Batch specifics would indeed be handled by Bria, and utilities for privacy-preserving coin selection are available in PDK. Recoverable errors are defined by BIP 78 and handled in HTTP responses. Total interactive server unreachability is handled by the BIP 21 standard and should fall back to default opt-in non-interactive address payment flow on the sender side.

  • I also have a bunch of questions as to the benefits of the incoming payjoin but this is perhaps because I don't understand the details. Like how would we determine which queue to trigger? What about RBF? etc.

Ask away! Why are there multiple queues in the first place? That confused me when reading the code.

RBF is supported. Replacement transactions (as always) would have to pay a higher absolute fee and exclude the receiver's input, because re-coordination is not yet supported in Payjoin V1. A replacement batch may be suitable. From evidence I've gathered, Transaction Pinning attacks, where the receiver double spends in a replacement with a different outcome, appear to be exceedingly rare. I have never heard of one being performed despite in all conversations with researchers on the topic. And they can still be handled with RBF of CPFP. Remember, a successful attacker would need to outbid Bria on absolute and relative fees, too.

bodymindarts commented 1 year ago

Help me understand what you mean by internal. Does bria process only manual payouts, or are they automated?

Internal ie. the grpc server is hosted on a private network. A (trusted) application server handling the business logic forwards payouts to bria. It's not directly exposed on the internet.

How does bria receive funds?

It is funded by end-users sending money to addresses that got exposed via the application server calling bria for a new address.

Why are there multiple queues in the first place?

To handle use cases like end-users being able to choose the transaction priority. Wether or not to use a high or a low fee is config attached to a queue. So you could have a high-priority and a low-priority queue. There are other ones but this is the simplest to explain.

The BIP 78 standard defines the Payjoin V1 protocol (...)

Okay I guess I will have to look through this to understand the nuances - but it seems like reliability has been taken into account. Some questions coming up from a brief skim

bodymindarts commented 1 year ago

Just thinking about possible ways of implementing the sending side (receiving will probably be more tricky).

I think the payjoin would get its own destination type. And the interaction with.

The sender <-> receiver interaction probably needs to be hooked into the batch_signing job. Here is where the original psbt is finalised.

DanGould commented 1 year ago

Thanks for the detailed responses. I feel like we're covering lots of ground.

Now's a good opportunity to link the Interactive Payment Batching is Better post which steps through each bit of complexity to show exactly how to think of payjoin as a coordinator for receive side batching. I'll link it up top too since it provides background that perhaps I should have started with.

Now to address your comments

high level service architecture

Help me understand what you mean by internal. Does bria process only manual payouts, or are they automated?

Internal ie. the grpc server is hosted on a private network. A (trusted) application server handling the business logic forwards payouts to bria. It's not directly exposed on the internet.

Looks like we'll have to consider how we interface payjoin to both bria and the application server. Which repository holds the application server, if that's public? If not part of bria or the application server, perhaps the payjoin http server is a third distinct thing if the complexity is worth it.

Sending Payjoin

Just thinking about possible ways of implementing the sending side (receiving will probably be more tricky).

Receiving is tricky but it's also the bottleneck on payjoin adoption. Since I reckon the savings benefits to a batching service are so great I may actually recommend making it the priority.

Sending is simple, but offers fewer benefits as far as batching goes. Senders have more privacy benefit and modest batching benefits since naive, non-payjoin senders are the ones who can already benefit from batching with without interaction (exchanges, businesses, etc.), but payjoin lets receivers leverage it beyond the existing limits. Combined you get a major improvement.

If you have an opportunity to send payjoin, a batch can only send to 1 payjoin (V1 😉) endpoint using an http call but can take a batch of payouts with it. No server required. Looks like you've got this intuition down.

I think the payjoin would get its own destination type. And the interaction with.

I'm not sure why Destination exists as an abstraction yet, especially since it only has one type now. Could you please tell me more about why it gets its own type and how you think payjoin fits as an instance of that abstraction? I think of payjoin as a way to coordinate destination preferences rather than a specific destination type but I have a limited view on that abstraction as of now.

Payjoin-triggered Batches

To handle use cases like end-users being able to choose the transaction priority. Wether or not to use a high or a low fee is config attached to a queue. So you could have a high-priority and a low-priority queue. There are other ones but this is the simplest to explain.

This makes a lot of sense to me. I see payjoin receipt as a way to preempt execution of payouts in one or many queues or based on Bria's preferences since it lowers the marginal fee required for each payout of the resulting payjoin. I.e. receiving payjoin is better batching. You achieve the same sat/vb fee rate by packing more payment intent into a transaction of the same size.

bodymindarts commented 1 year ago

The application server I'm referring to is the main galoy backend https://github.com/GaloyMoney/galoy. It forwards all onchain related stuff to bria so I think this (bria) code base is the correct one to host the receive side server. I also understand and agree to begin looking at receive side as it has more unanswered question and is the harder part to bootstrap. If we manage to get a clean integration it would be quite a UVP for bria and a benefit to the entire ecosystem.

I think to start with we probably need an additional server component and additional config on the PayoutQueue to specify wether or not it is trigger-able via an incoming payjoin. From there we'll have to see how much of the payout and utxo selection and signing code can be re-used from the existing batching implementation. It would be good to know sooner rather than later if integrating this implies a large code restructure or if there is an elegant way of handling the payjoin specifics without having to re-write the entire batch-processing.

One thing to consider is that the current batching process is split out into a series of asynchronous jobs that get processed via a job runner. This is to enable horizontal scaling and high availability - we can handle an arbitrary number of accounts / wallets / queues in a fault tolerant way simply by adding additional instances of the runtime and retrying the jobs on failure. This implies that the individual process steps currently may not all run on the same server as the one that would receive the payjoin request. I see different ways in which we can handle this - but it will require some consideration not to limit the scaling capabilities of the current design.

DanGould commented 1 year ago

Bria Payjoin Solution v0

Current job structure looks to accommodate a clean solution. Payjoin should preempt payout queues, construct a Payjoin Proposal PSBT Batch and respond over synchronous HTTP in the success case. In the failure case where the sender disappears, Bria broadcasts the sender's fallback transaction included in their request. Accounting may already be solved by the effective vs utxo distinction but still need to understand and document the specifics.

1. Run payjoin HTTP request handler

This payjoin recipient server could run inside or isolated from Bria. It would do sanity checks on HTTP requests and pass them to the payout processor by calling a preempt method and await a timely response in order to return an HTTP response.

2. Introduce process_payout_queue::execute alternative payjoin_preempt

Payjoin v1 is a synchronous protocol so batches would need to be created once HTTP requests are checked. Each valid request contains (checked_psbt, optional_parameters). A checked fallback "Original PSBT" may be scheduled for broadcast in the failure case using a modified Batch abstraction (Perhaps None AccountId, payjoin PayoutQueueId, None WalletId). Optional parameters specify the sender's fee contribution to incentivize Bria to contribute input. In the happy case, construct a Payjoin Proposal PSBT from Bria's UTXOs and available payouts.

Batch signing must be able to sign a request which includes inputs to be signed by other parties.

One open problem is handling accounting for both Original PSBT and Payjoin PSBT double spending the same incoming sats. Perhaps effective / utxo distinction is already enough between signed and broadcast to mempool. What is the exact distinction between those two accounting categories @bodymindarts?

3. Respond with Payjoin Proposal PSBT and await Payjoin broadcast

The Payjoin Proposal is included in the 200 HTTP response, either signed and broadcast by the sender, or after some timeout bria broadcasts the fallback transaction from the Original PSBT,

OR an error 4xx HTTP response is returned (potentially also broadcasting Original PSBT depending on error).

Pending accounting is handled and broadcast timeout cancelled for the unused path after sufficient confirmation.

DanGould commented 1 year ago

Re-posting some notes regarding accounting and customer service ux

Can end users account for their balance based on the chain alone?

No matter what, the end user can calculate how much they own and how much they've paid by looking at the chain alone. Already, deposit amounts and fee spend can be calculated by the depositor input(s) minus output(s) that do not belong to them. With or without output substitution, this does not change.

Fee savings without output script substitution

Assuming funds end up at the address from the BIP 21 is not replaced, a sender still relies on that address to be authenticated out of band. They know they got an address from you that they can save and reference. Receivers augment the amount received on that address by adding the amount of their input, having the opportunity to consolidate incoming sats with any number of receiver UTXOs. The sender can authenticate the response by checking that it contains the same address that it included in its request. Sender and receiver enjoy privacy from third-party observers as a result of their contribution.

Fee savings with output script substitution

Since payjoin communicates with a receiver at an authenticated (https) endpoint, any response Payjoin PSBT is also signed with keys derived from their TLS certificate, just like a deposit address. This allows the receiver to substitute the fallback Original PSBT output paying them with any number of outgoing batched outputs, deposits to cold wallets, etc., replacing their original address. Because the response is authenticated by HTTPS, a sender can check that the Payjoin PSBT sends no more than their original request parameters, sign and broadcast knowing the response was authorized by the receiver regardless of substitute outputs. The receiver has opportunity to spend the inbound sats to any number of outbound destinations before ever taking custody of them in their own UTXO, exceeding the previous limit on possible savings from batching, since the sender is paying for the transaction overhead, their input, and typically one receiver input.

Making Receipts

In any case depositors will have the TXID associated with the payjoin and could use that as reference. I see a couple of ways to solve the customer support problem where funds don't go to that original address.

  1. Save the HTTPS response from Bria. This is signed with keys derived from the certificate, and could be exported to customer support included in the UI as a transaction.
  2. Include a reference number in the response Payjoin PSBT in a proprietary field. This could be the original address or some other reference associated with the batch and would never be posted to chain, but it could be saved in the end user app transaction details.

KEY POINT 🔐: With or without output substitution, users rely on the same trust model when sending a deposit: end users trust that some HTTPS response from Galoy has not been tampered with. Either that's a single deposit address they requested from a backend, or a many addresses supplied by a Payjoin PSBT.