althea-net / rita

Rita is a routing and billing protocol that allows devices to buy and sell bandwidth
https://docs.althea.net/
Apache License 2.0
90 stars 24 forks source link

Lock free payment validator refactor #865

Closed jkilpatr closed 5 months ago

jkilpatr commented 5 months ago

This patch contains a significant number of changes to payment validator and in fact a complete re-design of how it moves information around, although the core logic itself remains the same.

Instead of having payment validator state stored in a lazy static and then accessed by a lock from various places, it's now stored in a single scope, the rita fast loop.

This dramatically simplifies the way information flows through the program, prviously when a payment was made by payment controller the entire state of payment validator was accessed via a lock and updated, now that data is simply passed as a function argument.

The one case where cross thread interaction is required is passing information from the actix endpoints into the rita client thread. Instead of using a lock over all of the payment validator state this has now been replaced with a lock free queue that payment validator dequeues from during validation.

This did require two changes to how Rita behaves.

Payments that are not yet validated are no longer included in the rita exit debts endpoint. We made that change to prevent clients from paying the exit twice if it was running slowly. The lock free changes make this slow operation much less probable and also repeated payments are a symptom of the low timeout bug we have just recently encountered and fixed.

Second when a client sent in a transaction that was a duplicate we would return an error, we no longer do so. Instead we return ok and drop the tx as a duplicate later.

Both of these changes required access to the entire program state in random other threads and contributed to slower operation and deadlocks.

Ideally I'd like this to be the model for how to refactor other Rita modules going forward. There's a lot of cruft still left over from the original Actix actors structure, which we essentially replicated as closely as possible when moving to individual threads in Async await.

Now we have to do the second part of that refinement.

As a final note Althea paymetns have been modified to expect a MsgMicroTX and payment_controller needs to be updated to send one.