PhakornKiong / ALS-Context

Continuation-local storage using Node.js AsyncLocalStorage with fallback to cls-hook
MIT License
11 stars 2 forks source link

Improving Performance of ALS #1

Open PhakornKiong opened 3 years ago

PhakornKiong commented 3 years ago

Continuing from Jeff-Lewis/cls-hooked#63

@SimonX200

Could you share the code that you're mentioning?

I'm actually rewriting cls-hooked (to support for node v8 till the introduction of ALS). From what I understand, ALS actually keep the store inside AsyncResource, and it could be retrieved inside async operation easily with async_hooks.executionAsyncResource(). Is that what you meant?

Sadly async_hooks.executionAsyncResource() is introduce about the same time as ALS, so I've got to rewrite this logic without help from the internal bindings from v8.

// This is the only async_hook activated
  init(asyncId, type, triggerAsyncId, resource) {
    const currentResource = executionAsyncResource();
    // Value of currentResource is always a non null object
    for (let i = 0; i < storageList.length; ++i) {
      storageList[i]._propagate(resource, currentResource);
    }

//  I'm not entirely sure why this is needed
// Propagate the context from a parent resource to a child one
  _propagate(resource, triggerResource) {
    const store = triggerResource[this.kResourceStore];
    if (this.enabled) {
      resource[this.kResourceStore] = store;
    }
  }
SimonX200 commented 3 years ago

@Darkripper214

Sure.

let cnt = 0;
let xcnt = 0;

function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

const CONTEXT_PREFIX = '-@CID@';

let globalLastContextId = 0;
const globalActiveContexts = {};

function getContext() {
  const curPrepareStackTrace = Error.prepareStackTrace;
  const curLimit = Error.stackTraceLimit;

  try {
    Error.stackTraceLimit = Infinity;
    Error.prepareStackTrace = function (_error, callSite) {
      return callSite;
    };

    const stackObj = {};
    Error.captureStackTrace(stackObj);

    if (stackObj.stack) {
      for (const c of stackObj.stack) {
        const fn = c.getFunctionName();
        if (fn?.startsWith(CONTEXT_PREFIX)) {
          const contextId = fn.slice(CONTEXT_PREFIX.length);
          return globalActiveContexts[contextId];
        }
      }
    }
  } finally {
    Error.prepareStackTrace = curPrepareStackTrace;
    Error.stackTraceLimit = curLimit;
  }
}

function t4(info = 't4') {
  console.log(info + '+1', getContext().a);
  getContext().a = 'zzz';
  xcnt++;
  console.log(info + '+2', getContext().a);
}

async function t3(info = 't3') {
  console.log(info + '+1', getContext().a);
  getContext().a = 'bbb';
  console.log(info + '+2', getContext().a);
  t4(info + ' sync t4+1');
  console.log(info + '+3', getContext().a);
}

async function t2(info = 't2') {
  console.log(info + '+1', getContext().a);
  getContext().a = 'aaa';
  console.log(info + '+2', getContext().a);
  await t3('async t2-t3');
  console.log(info + '+3', getContext().a);
  await runPromise(async () => {
    await t3('run t2-t3');
  });
  console.log(info + '+4', getContext().a);
  setTimeout(async () => {
    await runPromise(async () => {
      await t3('timer t3');
    });
  }, 0);
  console.log(info + '+5', getContext().a);
}

async function runPromise(func) {
  const prevContext = getContext();
  const contextIdValue = ++globalLastContextId;
  const contextId = contextIdValue.toString(36);
  const contextManagerName = '-@CID@' + contextId;
  const contextManager = {
    async [contextManagerName]() {
      await func();
    }
  }[contextManagerName];
  try {
    globalActiveContexts[contextId] = { ...prevContext, _id: contextId };
    await contextManager();
  } finally {
    delete globalActiveContexts[contextId];
  }
}

async function t1() {
  let start = Date.now();
  let i;
  for (i = 1; i <= 100000; i++) {
    await runPromise(t2);
    await sleep(10);
  }
  console.log((Date.now() - start) / i, i);
}

t1();
SimonX200 commented 3 years ago

Using the test from the node issue I see that ALS has the impact:

Performed 971472 iterations to warmup
Performed 929714 iterations (with ALS enabled)
Performed 9191036 iterations (with ALS disabled)
ALS penalty: 89.88%

A modified version for cls-hooked shows:

Performed 944473 iterations to warmup
Performed 10236526 iterations (with CLS not used)
Performed 562377 iterations (with CLS in use)
CLS penalty: 94.51%

With another modification for the stack approach within a run() and one get every 100 async I get a 30% penalty. 7M/s With a get every 10 async I get 81% penalty. 1.93M/s With a get every async its at 97% penalty 253k/s

In real-live I estimate not more than 1 get per 100 async

A few percent were saved by eliminating the string operations:

      for (const c of stackObj.stack) {
        const fn = c.getFunctionName();
        const result = globalActiveContexts[fn];
        if (result) {
          return result;
        }
      }
SimonX200 commented 3 years ago

The stack approach is not better than ALS when getContext is used twice (thats the least number of usages. One set, one get)

SimonX200 commented 3 years ago

@Darkripper214

The problem is that the stack is flooded with 'fn' even when having awaits everywhere. In average there are 46.

async function fn(depth) {
  if (depth > 1) {
    if (depth === 2) {
      getContext();
    }
    await fn(depth - 1);
  }
  return /test/.test('test');
}

When doing the stack-trace on the second nest level the average stack depths is 52. When doing the stack-trace on the first nest level the average stack depths is 3.

Also the stack is not linear and the wrong context is picked.

So forget that approach. Its not working.

PhakornKiong commented 3 years ago

@SimonX200

I think this library is about there, I still have some small idea to tweak some of the stuff, but not sure if it works. Some micro-benchmark shows it is slightly faster than cls-hooked

https://github.com/Darkripper214/ALS-CLS-Benchmark

You can do npm run bench and see the performance of run() with "one set and one get" inside this repo as well.

chetan commented 2 years ago

FWIW, in my use-case, this package, ALS-Context was only marginally faster than cls-hooked on node16 but there were some corner cases that needed working around while cls-hooked worked out of the box. I believe it's a result of a similar bind/bindemitter type issue related to the AsyncLocalStorage.

In particular, the one case I identified was with callbacks from the redis client. Inside the async/callback context after doing a redis fetch, the ALS context was lost. Rather than try to look for other such cases, I decided to stick with cls-hooked for now.