fedimint / fedimint

Federated E-Cash Mint
https://fedimint.org/
MIT License
536 stars 209 forks source link

Remove polling #1197

Closed maan2003 closed 1 year ago

maan2003 commented 1 year ago

Remain tasks:

Currently, we use polling to check for updates(say "fetch_transaction").

What we need is to subscribe for change in web server task, and push notification for these changes in the main task. But we can't immediately notify the web server task because the transaction is not yet committed. And the database won't contain the value.

So mainly I see two solution:

Another Attempt #273

dpc commented 1 year ago

As things are RN, I don't understand what is this change doing.

justinmoon commented 1 year ago

Manmeet is trying to get it so clients can subscribe to events as websocket subscriptions, versus polling over websockets which we do currently. @Maan2003 could you add a little more to the description of the PR?

@m1sterc001guy: could you take a look at the database changes?

dpc commented 1 year ago

Manmeet is trying to get it so clients can subscribe to events as websocket subscriptions, versus polling over websockets which we do currently.

I'm confused then why all the changes are about database transactions. :D

maan2003 commented 1 year ago

Sorry, I forgot to mark this as draft. This wasn't complete then. Now mostly is. I have updated the PR description.

maan2003 commented 1 year ago

Thank you all for the reviews. I have simplified the design.

maan2003 commented 1 year ago

Also wasm is problem. tokio::Notify isn't available on wasm, and I don't know any suitable replacement.

maan2003 commented 1 year ago

Update: tokio is better now, we can use tokio with features sync on wasm!

elsirion commented 1 year ago

Do we ever need to wait for a key prefix? That's when the hash based approach might be bad at.

We could introduce an associated constant for DB keys of the form

const PREFIXES: &'static [usize] = &[8, 40];

and hash every prefix and add it as a separate event.

let serialized_key: &[u8] = …;
for prefix in PREFIXES {
    events.add_key_event(serialized_key[0..prefix]);
}

That way we could subscribe to any DatabaseKeyPrefix and not just DatabaseKeys.

maan2003 commented 1 year ago

Notification is general mechanism that can notify for any hashable key. (Should move Notification outside of database, and pass it during begin transaction)

Database and Database transaction offer helper over for DBkeys. If we need notification for prefixes, we can add helpers (notify_prefix and wait_prefix) which should be straightforward to implement.

maan2003 commented 1 year ago

const PREFIXES: &'static [usize] = &[8, 40]; ......

Wow! that is clever. Took some time to understand.

elsirion commented 1 year ago

Wow! that is clever. Took some time to understand.

In that case it will be good to document well ^^

maan2003 commented 1 year ago

Moving ModuleDecoderRegister into Database. It simplifies lot of interfaces.

maan2003 commented 1 year ago

Separated out some database changes into a new PR which should easier to review in isolation.

maan2003 commented 1 year ago

You won't like the Databaase to Api Endpoint commit. But, because api endpoints have to wait for a key, it can't just work with single transaction :/

elsirion commented 1 year ago

But, because api endpoints have to wait for a key, it can't just work with single transaction :/

I think not using transactions and instead awaiting changes is ok. In cases we really need the consistency we can first await all the notifications and then do a read in a separate tx. But I think in most cases we want one value we directly wait for anyway.

dpc commented 1 year ago

It's important that api here enforces correct race-condition-free usage, otherwise we are going to find ourselves debugging random hangs and delays all over the place.

We should not have a standalone wait_key, because the user must do things in the following order:

If one checks the key first, then register for notification later - it's a bug.

Therefore we should have an API like:

dbtx.get_and_wait(key, |value| {
   value.is_some() // or whatever else check the user desires
});

or something similar.

maan2003 commented 1 year ago

We should not have a standalone wait_key, because the user must do things in the following order:

  • register for notifications
  • check the key
  • if not how they want, sleep

I don't follow. This is not the case for wait_key RN. It does everything in single call. It is case of Notifications::wait but that is a low level api.

@dpc

elsirion commented 1 year ago

@Maan2003 What @dpc is getting at is that currently we can only safely await the presence of a key, not a certain value. For that we need to:

  1. Register a notification for key updates
  2. Read the value of the key
  3. Check if the value already is what we want, if so return it and de-register the notification
  4. Wait for notifications in a loop, for each check if the value is already what we want and return it, otherwise continue the loop

You can't impl this safely (right order of 1+2) with the current API afaik.

maan2003 commented 1 year ago

Got it :+1:

dpc commented 1 year ago

What @dpc is getting at is that currently we can only safely await the presence of a key, not a certain value.

@elsirion @Maan2003

Current the user can do:

let value = loop {
  let value = dbtx.get(key);
  if is_key_good(&value) {
      break value;
  }
  dbtx.wait_key();
}

This has a race condition and can lead to (possibly indefinite) hang. We should have API that can't be misused here, because I suck at spotting issues like this :D

maan2003 commented 1 year ago

update: notifications are now automatic for keys that NOTIFY flag set.

maan2003 commented 1 year ago

Therefore we should have an API like: dbtx.get_and_wait(key, |value| { value.is_some() // or whatever else check the user desires })

implemented, with name wait_key_check

maan2003 commented 1 year ago

I am not sure about client side changes. need help with strategies. @jkitman

The Retry404 strategy is not needed now.

main change from client's perspective : example /fetch_transaction

earlier server responded with 404 error if transaction wasn't processed

now server waits until the transaction is processed and then responds. or times out with 408 error after 60 sec.

jkitman commented 1 year ago

I am not sure about client side changes. need help with strategies.

In general, I think the strategies can be modified to work with the non-polling API. They will subscribe, wait for enough responses that match, then return the result.

The 404 can be eliminated probably, to be replaced with awaiting a transaction id (either accepted or rejected).

maan2003 commented 1 year ago

UPDATE: first two commits need heavy rebasing (#1331)

maan2003 commented 1 year ago

Doesn't work at all. Biggest problem: no way to merge notify_queue from module_tx to parent_tx. One way is to Arc<Mutex<NotifyQueue>>. But that is too hacky for my liking.

I am inclined to just store a module_instance: Option<ModuleInstanceId> in DatabaseTransaction itself instead of #1331 cc @m1sterc001guy

dpc commented 1 year ago

Biggest problem: no way to merge notify_queue from module_tx to parent_tx.

Oh, interesting.

Can you elaborate what's wrong?

dpc commented 1 year ago

@Maan2003 Oh, I think I see it better after a bit of investigation.

Architecturally the problem here is that current functionality tries to implement sending notifications at the top layer (dyn-wrapper DatabaseTransaction) while IsolateDatabaseTransaction works on dyn IDatabaseTransaction wrapping layer.

I think one approach is to move notification sending to be a under the IDatabaseTransaction: just like IsolatedDatabaseTransaction currently is.

struct NotifyOnKeyUpdateDatabaseTransaction {
    inner_tx: Box<dyn DatabaseTransactio>,
    notify: /* some mechanism to send out notifications that is shareable and clonable, see below */,
}

impl NotifyOnKeyUpdateDatabaseTransaction {
fn new(inner: Box<dyn DatabaseTransaction< notify: /* buckets of something */) -> Self { ... }
}

impl IDatabaseTransaction for NotifyOnKeyUpdateDatabaseTransaction {
   // impl all methods by calling inner, then sending a notification

}

I also don't know if Notify is the right primitive to use. It requires registriation, because it's a mpsc channel. Instead we want spmc channel (like tokio::sync::broadcast which is actualy mpmc). This way the part that is notifying can just do a notification once, unconditionally, and anyone that has the receiving end will be woken up, This avoids any registration.

The NotifyOnKeyUpdateDatabaseTransaction holds N buckets of brodcast::Sender and just self.buckets[hash].send().

The receiving end grabs the self.buckets[hash].clone() as a "registration" and then .recv() on it when wants to wait for any updates on this bucket.

So:

I'm unclear on the wait operation on the IsolatedDatabaseTransaction. If someone does a wait on that, they probably want the key prefix added to what they are waiting on. But that also means that the wait operations need to become part of IDatabaseTransaction trait.

Sorry if I got something wrong, as I have limited time to spend on this interesting problem, RN. Hopefully you can make it work with some changes. If not I'll be happy to discuss more and help one I find more time.

maan2003 commented 1 year ago

implementing at transaction level is an interesting idea, but we don't have access DatabaseKey::NOTIFY there.

I am thinking of a hybrid approach. Storing Box<NotifyTx<dyn IDatabaseTransaction>> inside DatabaseTransaction.

EDIT: static typing won't work, because of need to make nested isolatedtransaction.

This way the part that is notifying can just do a notification once, unconditionally, and anyone that has the receiving end will be woken up, This avoids any registration.

But clone the receiver end is the registration step here. :shrug: Not sure how that is different from "getting a Notified future"

But that also means that the wait operations need to become part of IDatabaseTransaction trait.

not possible, you have to start a transaction every time to check if notification was fake or not.

But, we will need a IsolatedDatabase anyways.

maan2003 commented 1 year ago

issue with wrapping NotifyingTx inside IsolatedTx: NotifyingTx needs to work with DatabaseKeys and IsolatedTx works with bytes.

maan2003 commented 1 year ago

we can add a notify: bool wherever we pass a key in IDatabaseTransaction but that also feels weird.

IMO best way is to implement module isolation and notification at DatabaseTransaction level.

maan2003 commented 1 year ago

ping @dpc

dpc commented 1 year ago

issue with wrapping NotifyingTx inside IsolatedTx: NotifyingTx needs to work with DatabaseKeys and IsolatedTx works with bytes.

Everything should work with bytes only, no? Serialization/Deserialization is just a dyn-wrapper extra utilities for better usability.

we can add a notify: bool wherever we pass a key in IDatabaseTransaction but that also feels weird.

True, this is a fundamental mismatch. Fist, I don't know if it's worth optimizing. Just send notifications for everything. We are about to do big IO, trying to save some relativey cheap operation seems premature. Make it work, make it right, make it fast - in that order. If we ever figure out that this is a problem, we'll get back and optimize.

Passing an extra bool (or just some kind of enum KeyType) doesn't feel too bad to me. It would only happen in IDatabaseTranasaction which is invisible to the user (for most part). I even kind of like it.

maan2003 commented 1 year ago

Finally done. I am satisfied with this solution. and no changes to IDatabaseTransaction.

Rest of commits still need rebase, will do soon.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 78.08% and project coverage change: +0.05 :tada:

Comparison is base (f70af44) 63.57% compared to head (ade6761) 63.63%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1197 +/- ## ========================================== + Coverage 63.57% 63.63% +0.05% ========================================== Files 136 136 Lines 25896 25975 +79 ========================================== + Hits 16464 16528 +64 - Misses 9432 9447 +15 ``` | [Impacted Files](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint) | Coverage Δ | | |---|---|---| | [fedimint-server/src/db.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-ZmVkaW1pbnQtc2VydmVyL3NyYy9kYi5ycw==) | `55.55% <ø> (ø)` | | | [client/client-lib/src/ln/mod.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-Y2xpZW50L2NsaWVudC1saWIvc3JjL2xuL21vZC5ycw==) | `76.87% <31.25%> (-2.51%)` | :arrow_down: | | [client/client-lib/src/wallet/mod.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-Y2xpZW50L2NsaWVudC1saWIvc3JjL3dhbGxldC9tb2QucnM=) | `84.68% <44.00%> (-2.40%)` | :arrow_down: | | [client/client-lib/src/mint/mod.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-Y2xpZW50L2NsaWVudC1saWIvc3JjL21pbnQvbW9kLnJz) | `86.00% <64.70%> (-1.50%)` | :arrow_down: | | [fedimint-server/src/consensus/mod.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-ZmVkaW1pbnQtc2VydmVyL3NyYy9jb25zZW5zdXMvbW9kLnJz) | `81.20% <95.12%> (+0.41%)` | :arrow_up: | | [client/client-lib/src/lib.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-Y2xpZW50L2NsaWVudC1saWIvc3JjL2xpYi5ycw==) | `77.65% <100.00%> (ø)` | | | [fedimint-core/src/api.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-ZmVkaW1pbnQtY29yZS9zcmMvYXBpLnJz) | `82.47% <100.00%> (+2.29%)` | :arrow_up: | | [fedimint-server/src/net/api.rs](https://codecov.io/gh/fedimint/fedimint/pull/1197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint#diff-ZmVkaW1pbnQtc2VydmVyL3NyYy9uZXQvYXBpLnJz) | `83.51% <100.00%> (+1.22%)` | :arrow_up: | ... and [9 files with indirect coverage changes](https://codecov.io/gh/fedimint/fedimint/pull/1197/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint) Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=fedimint)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

maan2003 commented 1 year ago

This is ready for review. I don't understand the error message. It is really weird.

**BROKEN** plugin-ln_gateway: error=Federation client operation error: MintApiError(FederationError({PeerId(0): Rpc(Call(Custom(ErrorObject { code: ServerError(400), message: \"High level transaction error: The transaction's signature is invalid\", data: None }))), PeerId(1): Rpc(Call(Custom(ErrorObject { code: ServerError(400), message: \"High level transaction error: The transaction's signature is invalid\", data: None })))}))

This is consistently reproducible locally (./scripts/latency-tests.sh)

dpc commented 1 year ago

LGTM, but when image

? :D

justinmoon commented 1 year ago

Blocked on https://github.com/fedimint/fedimint/issues/1708

maan2003 commented 1 year ago

I have reduced the scope of this PR from removing all polling to just remove polling for /fetch_transaction. This is much smaller change and also passes CI.

Removing polling from module endpoints requires Contexts types to elegantly give access to wait_key, without just passing a database directly.

maan2003 commented 1 year ago

Don't merge yet, timeout and task groups integration for api endpoints is still pending.

maan2003 commented 1 year ago

task groups integration for api endpoints

this is not needed because the api server shuts down correctly already.

dpc commented 1 year ago

arewelandityet.com

elsirion commented 1 year ago

Really happy we have this now!