MatrixAI / Polykey-CLI

Polykey CLI - Open Source Decentralized Secret Sharing System for Zero Trust Workflows
https://polykey.com
GNU General Public License v3.0
6 stars 3 forks source link

Coalesce concurrent locking when writing the session token during GRPC requests #18

Open emmacasolin opened 2 years ago

emmacasolin commented 2 years ago

Specification

Currently requests to refresh the session token will not be dropped if another process is holding a write lock on the session file. It will wait for the other process to finish and then refresh the session token as well. This is unnecessary, since we can simply drop the request if we know the other process will refresh the session token when it completes (which we know it will).

In order to do this we need to make changes to the fd-lock library we're currently using (likely using C++) to allow us to see which processes currently hold a lock on the session file, as well as what type of lock it is.

Additional context

test('Parallel processes should not refresh the session token', async () => {
      let tokenP1 = 'token1' as SessionToken;
      let tokenP2 = 'token2' as SessionToken;

      tokenBuffer = await fs.promises.readFile(sessionFile);
      const prevTokenParallel = tokenBuffer.toString() as SessionToken;

      async function runListCommand(): Promise<SessionToken> {
        // At least 1 second of delay
        // Expiry time resolution is in seconds
        await sleep(1000);
        await testUtils.pkStdio(command, {}, dataDir);
        const buffer = await fs.promises.readFile(sessionFile);
        return buffer.toString() as SessionToken;
      }

      [tokenP1, tokenP2] = await Promise.all([
        runListCommand(),
        runListCommand(),
      ]);

      // Verify both tokens
      expect(
        async () => await polykeyAgent.sessionManager.verifyToken(tokenP1),
      ).not.toThrow();
      expect(
        async () => await polykeyAgent.sessionManager.verifyToken(tokenP2),
      ).not.toThrow();

      // Check that both commands were completed
      expect(tokenP1).not.toEqual('token1');
      expect(tokenP2).not.toEqual('token2');

      // Check that the session token was refreshed exactly one time
      if (tokenP1 === prevTokenParallel) {
        expect(tokenP2).not.toEqual(prevTokenParallel);
      } else if (tokenP2 === prevTokenParallel) {
        expect(tokenP1).not.toEqual(prevTokenParallel);
      } else {
        expect(tokenP1).toEqual(tokenP2);
      }
    });
  });

Tasks

  1. Make necessary changes to fd-lock library
  2. Write tests to check behaviour
CMCDragonkai commented 2 years ago

MatrixAI/Polykey#290 is the issue for making changes to fd-lock

CMCDragonkai commented 2 years ago

Originally the idea was that on concurrent requests, the lock on the session token would be checked, and if it is already locked, it would drop the lock. This ensures that concurrent requests don't block each other only because they are queuing up to write the session token.

Because the onReceiveMetadata used by GRPC is called synchronously, this means that writing the token is done asynchronously and therefore doesn't block the GRPC call anyway.

This turned out to be a problem later in our tests when using pkStdio because it would finish the main CLI function without writing the token. Details here: https://github.com/MatrixAI/js-polykey/pull/296#issuecomment-989517265 and https://github.com/MatrixAI/js-polykey/pull/296#issuecomment-989740159

Upstream is now potentially fixing this so that onReceiveMetadata will be asynchronously called and therefore async operations there will complete before the GRPC call finishes.

When that happens our concurrent GRPC calls will be blocking each other. And we will need to address this by changing the way our locking works.

  1. We will need to ensure that if the lock is already held, we can drop the locking and just complete, thus coalescing the session token write between concurrent GRPC calls.
  2. We should also bring in MatrixAI/Polykey#290 to ensure that our overall locking design can be improved anyway, right now reading also blocks other reads.
CMCDragonkai commented 2 years ago

Not a bug atm, removing the label.

CMCDragonkai commented 1 year ago

The transition to JSON RPC means we need to consider how to implement middleware for the streams.

Right now we have client side GRPC interceptors and also utility functions on the server side handlers.

We could use transform streams for the JSON streams, and use them as middleware for the RPC client and RPC server.

These middleware may need to be able to cancel the subsequent handling if for example due to authentication failure.

We could make this simpler with just providing utility functions to be used.

CMCDragonkai commented 1 year ago

The RPC server has to always consume the first frame (but it can yield this to the next consumer). This is because it has to know what the method is, to know which RPC handler will handle it. At this point it can also perform middleware duties and know if the call is authenticated or now. It could then return a failure frame to the peer and close the call, or provide metadata to the end consumer.

CMCDragonkai commented 1 year ago

So the only thing we need here is to change fd-lock so that it supports a 0 timeout. That makes a "nonblocking locking library". Then if something else is locking it, then we just skip if our timeout is 0.

CMCDragonkai commented 1 year ago

Moving to Polykey-CLI.

CMCDragonkai commented 1 month ago

GRPC and its middleware design isn't relevant anymore, but the general problem of coaslescing the locking is still an issue. Because it is necessary to allow concurrent operations of the PK CLI when interacting with the PK agent due to its client to server authentication.

PK-CLI still currently uses fd-lock, and I'm pretty sure this stub repo https://github.com/MatrixAI/js-file-locks was created to eventually replace it.

This issue should be re-specced in line with the current RPC structure, and the middleware that handles the authentication.

CMCDragonkai commented 1 month ago

@tegefaulkes can you verify this (delegate this to someone too).

tegefaulkes commented 1 month ago

I'll have a look into this.

CMCDragonkai commented 1 month ago

The parent issue of this is actually https://github.com/MatrixAI/Polykey/issues/290.

While the issue title talks about GRPC, I think this is still relevant to the current js-rpc. So the spec should be updated.