Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
326 stars 206 forks source link

missing oracles cause promise growth #9923

Open warner opened 4 weeks ago

warner commented 4 weeks ago

In https://github.com/Agoric/agoric-private/issues/169 we observed constant growth in the number of unresolved promises, in a testnet that was mainfork-cloned from our mainnet, but which lacked its own oracles, and thus was not supplied with price quotes. My analysis notes from that ticket:

My hunch here is that a price-feed oracle has not been re-attached after the upgrade yet, and each time the lockOraclePrices wakeup fires and sees this state, it repeats a bunch of work that should have only been done once. Like, it's attempting to create new QuoteNotifiers instead of re-using the ones created during the previous wakeup. I wonder if all these QuoteNotifiers are waiting for the same thing, and v67 and v69 will resolve a huge backlog of getUpdateSince promises the moment an oracle connects and sends in the first price.

I'd like to know how v67/v69 are reacting to the makeQuoteNotifier() call: are they looking up the supplied brand pair in a table? I'd think the calls should be idempotent, but the high kref number suggests that we're making up brand new ones for each request. And I see over 50 calls to makeQuoteNotifier() with the same arguments in this slogfile, spaced almost exactly 3600 seconds apart.

I think this is where we summon @Chris-Hibbert.

We don't expect to be in this situation on a real network, at least not for very long. Oracles will do their job, prices will flow, auctions will happen. But:

The task is to understand exactly where these promises are coming from, what action whose lack is causing them to be generated, and then fix the code to prevent the long-term growth.

warner commented 2 weeks ago

Ok so vaultManager.js has this function named observeQuoteNotifier:

        observeQuoteNotifier() {
          const { state } = this;

          const { collateralBrand, collateralUnit, debtBrand, storageNode } =
            state;
          const ephemera = collateralEphemera(collateralBrand);

          const quoteNotifier = E(priceAuthority).makeQuoteNotifier(
            collateralUnit,
            debtBrand,
          );
          // @ts-expect-error XXX quotes
          ephemera.storedQuotesNotifier = makeStoredNotifier(
            // @ts-expect-error XXX quotes
            quoteNotifier,
            E(storageNode).makeChildNode('quotes'),
            marshaller,
          );
          trace(
            'helper.start() awaiting observe storedQuotesNotifier',
            collateralBrand,
          );
          // NB: upon restart, there may not be a price for a while. If manager
          // operations are permitted, ones that depend on price information
          // will throw. See https://github.com/Agoric/agoric-sdk/issues/4317
          const quoteWatcher = harden({
            onFulfilled(value) {
              ephemera.storedCollateralQuote = value;
            },
            onRejected() {
              // NOTE: drastic action, if the quoteNotifier fails, we don't know
              // the value of the asset, nor do we know how long we'll be in
              // ignorance. Best choice is to disable actions that require
              // prices and restart when we have a new price. If we restart the
              // notifier immediately, we'll trigger an infinite loop, so try
              // to restart each time we get a request.

              ephemera.storedCollateralQuote = null;
            },
          });
          void watchQuoteNotifier(quoteNotifier, quoteWatcher);
        },

The goal is to populate ephemera.storedCollateralQuote. Every time this function is called, it will always send a new makeQuoteNotifier() to the priceAuthority, and will always build a new makeStoredNotifier() around it, and will alway uses watchQuoteNotifier() to glue an ephemeral quoteWatcher to that, and that watcher is the one place where storedCollateralQuote can be filled.

This function is called from several places in vaultManager that need a stored quote, with a clause like:

          if (!storedCollateralQuote) {
            facets.helper.observeQuoteNotifier();
            throw Fail`maxDebtFor called before a collateral quote was available for ${collateralBrand}`;
          }

and at least one of those places can be triggered by an hourly timer wakeup.

Which means every timer wakeup will provoke this new set of objects and promises, until some QuoteNotifier finally reports either a price or an error. And something deeper in the process is unable to do either without a PushPrice coming in from off-chain, which isn't happening on the testnet.

shape of a fix

I think we need that const quoteNotifier = E(priceAuthority).makeQuoteNotifier() to have a wider scope.

We want observeQuoteNotifier() to not make a new request if there's already a request pending. And we need it to be reliable in the face of upgrade, and errors.

Maybe something that starts with:

if (!ephemera.quoteNotifier) {
  ephemera.quoteNotifier = E(priceAuthority).makeQuoteNotifier();
  ephemera.quoteNotifier.catch(_err => ephemera.quoteNotifier = undefined);
}

It should probably defer building the rest of the wrapping objects until we get a quote notifier back. The goal is to not ever build redundant objects, or make redundant requests.

relation to promise growth

Note that I still don't know exactly which promise is the one being accumulated. I know it's one of the getUpdateSince that gets sent to aQuoteNotifier, and I know that we're making new QuoteNotifiers each cycle, but I don't know if the getUpdateSinces are all being sent to some pre-existing QuoteNotifier, or if each one is going to a new QuoteNotifier.