Closed CMCDragonkai closed 1 year ago
Note that all fields in proto3
is optional and this is all facilitated by a default value. https://stackoverflow.com/questions/31801257/why-required-and-optional-is-removed-in-protocol-buffers-3
The default values are all based on the types of the fields. So for primitives like numbers, it's 0
, but for other subtypes, it is null
in JS.
Optional fields returned in proto 3.5. What this means is usually generates a handler to check whether it was set at all, that way you can differentiate between a value not being set by the client versus the client setting the value to the default value. This can be important when you want to change behaviour if the client really didn't set the value.
with optional you get the ability to explicitly check if a field is set. Lets say you have a field int32 confidence. Currently when receiving a message which such a type you cannot know the difference between confidence = 0 or confidence not set. Because default values are optimized away in the serialization. If you mark the field as optional then presumably some extra bits are set in the serialization and a has_confidence() method will be generated so that you on the receiving end can disambiguate the two.
Regarding all the number types in proto3. There are 64 bit numbers:
int32
int64
uint32
uint64
sint32
sint64
double
float
Due to this issue https://github.com/protocolbuffers/protobuf/issues/3666. There's no support for bigint yet. The default loses precision.
Work around is:
message dummy {
uint64 bigInt = 1; [jstype = JS_STRING]
}
This turns it into a string.
This shows the mapping from the proto3 code to the generated JS code: https://developers.google.com/protocol-buffers/docs/reference/javascript-generated, of interest is the map, bytes, one of, and enums.
The guide says that if we want to stop using a certain field in a message. It's important to reserve the field number and also the field name so that it cannot be used again. This means that older clients won't get confused.
Basically the field numbers don't have to be actually in sequence. They represent unique positions in the message.
message Foo {
reserved 2, 15, 9 to 11;
reserved "foo", "bar";
}
enum Foo {
reserved 2, 15, 9 to 11, 40 to max;
reserved "FOO", "BAR";
}
Regarding pagination I like this usage of repeated
:
message SearchResponse {
repeated Result results = 1;
}
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}
It seems that this makes it easier to define a single message type and then lists of messages. I had done this previously in OpenAPI when I had situations where you have GET /resource/1
vs GET /resources
where the former acquires a single resource, and the latter fetches a page of resources.
I imagine that responses may have further metadata beyond just the core data. So I could imagine types being represented like:
// the actual domain structure
XDomainStructure {
// ...
}
// a single resource /resources/1
YResponse {
XDomainStucture x_domain = 1;
}
// multiple resources /resources
ZResponse {
repeated XDomainStructure x_domains = 1;
}
That way "response" messages are differentiated from domain structures that we want to work with.
So in our proto definitions we can identify the relevant domain structures we want to work with, these should really be derived manually by the programmer by investigating each domain. For example in our notifications domain, we have a domain structure representing a notification message. But this is not the same type as the type of request & response messages being sent back and forth on the gRPC API about notifications.
It's possible to also import proto files. This can make it easier to manage our proto definitions if we separate our type definitions from our message definitions from our service definitions. It could allow us to form subdirectories in src/proto/schemas
. We would have to check if our CLI command scripts/proto-generate.sh
can actually do this though.
Importing is just a matter of:
import "myproject/other_protos.proto";
To re-export, just use import public "other.proto";
.
I believe the name should start from where the --proto_path
is set to. This should be src/proto/schemas
.
I forgot if we were ever able to use any of the google protos. Like import "google/protobuf/any.proto";
. The any type allows us to nest any kind of message type. It will be serialized as bytes. We are likely not going to use this particular type google.protobuf.Any
though.
However there are a number of other types that are quite useful like google/protobuf/timestamp.proto
. That's all located here: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf
The oneof
can have properties that use different field numbers. But no matter what if you set any one of those properties, it should clear the other properties.
The map
only works for fixed types for keys and values. So for arbitrary values, we should use JSON encoding and return it as a string.
The map
also doesn't work as nested. So a value of map
can't be another map
.
Although you should be able to put another map
inside a message type, and that would be sufficient to nest maps.
Need to point out that re-reading https://medium.com/expedia-group-tech/the-weird-world-of-grpc-tooling-for-node-js-part-3-d994de02bedc seems to mean that our current usage of static tooling means that importing packages may not work. Not sure... at this point.
This issue seems to indicate all we need to do is copy the proto source code verbatim in our src/proto/schemas
and then it should work. https://github.com/agreatfool/grpc_tools_node_protoc_ts/issues/72
Note that a gRPC channel tends to have a limit number of max concurrent connections (from that point onwards rpc requests are queued). I believe we have separate channels to each different node. We aren't likely to hit this max number in production right now, but the chattiness may increase when we integrate gestalt synchronisation #190. This issue has more details https://github.com/grpc/grpc/issues/21386. The current work around is to create new channels and load balance between channels, however this is considered to be a temporary solution.
Use a pool of gRPC channels to distribute RPCs over multiple connections (channels must have different channel args to prevent re-use so define a use-specific channel arg such as channel number).
This will impact any benchmarking we do that involves launching multiple concurrent gRPC requests.
This limit is mentioned here: https://github.com/grpc/grpc-node/blob/master/PACKAGE-COMPARISON.md and also compares the @grpc/grpc-js
vs grpc
library (that is now deprecated).
Regarding API versioning. gRPC is naturally backwards compatible. So as long as there are no breaking changes to the API, it's possible to keep using the same version, and clients can continue to work as normal. However when there is a breaking change, the MS guide recommends using package names.
So right now we have agentInterface
, but we can instead call it agentInterface.v1
. And then increment this version specifier whenever we have backwards incompatible changes.
This basically means one can create different service interfaces, and it is possible to run multiple versions of the same API. So right now we may have agent service, but we may also run 2 agent services on the same port. This would be similar to having api/v1
and api/v2
in a RESTful URL where both still work. The only challenge here is to maintain handlers for the old API as well.
However this doesn't telegraph to the client side what version it is using? It's just a matter of the client selecting which version they want to use, and hoping that the agent side still has that version. Maybe the lack of that version service existing is enough to signal to the client that their source code version is too old and needs to be updated. This will need to be prototyped in the test GRPC to see what happens when the relevant versioned package is no longer being offered as a service, what exceptions/errors occur.
This would benefit from being able to break up the proto files into subdirectories so that way common messages and type declarations can be shared.
It seems that there was an idea to use server reflection for the client to query about the protobuf descriptions. https://stackoverflow.com/a/41646864/582917
Server reflection is not available directly on grpc-js https://github.com/grpc/grpc-node/issues/79. However there are libraries that have built on top to include it:
The examples given are that grpcurl
can then be used directly without having access to the proto files since it is possible to request the proto files from the service.
It has some relationship to the descriptor proto: https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto
Also there's option declaration. Not sure how that works yet.
Found an interesting library that builds on top of @grpc/grpc-js
https://github.com/deeplay-io/nice-grpc they seem to have done a lot of work on gRPC js and making it more user friendly. It's too complicated right now to integrate, but would be worth checking out later.
Relevant errors that we should have more tests for: https://www.grpc.io/docs/guides/error/
Recommend reading over this @tegefaulkes when you're finished with your checklist in vaults refactoring.
Playing around with getting the message definitions to exist within their own .proto
files and packages.
It seems doable, I can define for example the vault message inside domains/Vaults.proto
and import them into Client.Proto
with import public "domains/Vaults.proto";
. From there I can use the messages inside a service by doing
//OLD
rpc VaultsRename(VaultRenameMessage) returns (VaultMessage) {};
//NEW
rpc VaultsRename(Vault.Rename) returns (Vault.Vault);
When construction the messages now we can do VaultMessage = new clientPB.Vaults.Rename()
. I'm likely going to rename clientPB
to messages
at some point.
Are package/file names meant to be lowercase? It seems that way from other examples. Have a look at how one is supposed to import google proto library.
On 10/20/21 3:34 PM, Brian Botha wrote:
Playing around with getting the message definitions to exist within their own |.proto| files and packages.
It seems doable, I can define for example the vault message inside |domains/Vaults.proto| and import them into |Client.Proto| with |import public "domains/Vaults.proto";|. From there I can use the messages inside a service by doing
|//OLD rpc VaultsRename(VaultRenameMessage) returns (VaultMessage) {}; //NEW rpc VaultsRename(Vault.Rename) returns (Vault.Vault); |
When construction the messages now we can do |VaultMessage = new clientPB.Vaults.Rename()|. I'm likely going to rename |clientPB| to |messages| at some point.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/MatrixAI/js-polykey/issues/249#issuecomment-947321119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE4OHOZF7IUWNRNT3OLHXTUHZBGFANCNFSM5GFZQLYA.
I didn't check what it should be but I can change it if needed.
I think we should have in our src/config.ts
:
stateVersion
- for the node stateserviceVersion
- for the GRPC service (or HTTP service)sourceVersion
- same as package.json version (ideally we can import this)The sourceVersion
is ideally fetched from the package.json
. However we don't want to use import packageJson from '../package.json';
because that screws up the dist
build. Instead you can make config.ts
read up one directory. When you distribute src/config.ts
, it becomes dist/config.ts
. This then can access the ../package.json
that will be in the NPM distribution.
However I'm not sure if this will work under vercel/pkg. @tegefaulkes can you check this out as well.
@tegefaulkes remember to tick off things as you're making progress. And if you have a new PR/branch, create a PR for it and link it here too. The task list should be copied there.
I'm still working on creating common domain types and common message types. Almost done with that.
I've created a new branch off of master for this under API_Review
Consider how to shutdown the grpc server and terminate all client connections as well
Looks like GRPCServer.ts
already has a method for doing this.
public closeServerForce(): void {
this.server.forceShutdown();
}
When it comes to the sessions and authentication I feel there is a lot of duplicated code that could be made a utility function. On the client side we have code that updates the session token. This is in every CLI command.
const pCall = grpcClient.nodesClaim(nodeClaimMessage);
const { p, resolveP } = utils.promise();
pCall.call.on('metadata', async (meta) => {
await clientUtils.refreshSession(meta, client.session);
resolveP(null);
});
const response = await pCall;
await p;
Likewise on the server side we have code that checks the session token and sends an updated token to the client. this is a the beginning of each RPC method.
await sessionManager.verifyToken(utils.getToken(call.metadata));
const responseMeta = utils.createMetaTokenResponse(
await sessionManager.generateToken(),
);
call.sendMetadata(responseMeta);
If we had to update either of these it would be annoying.
Prototype by abstracting into subdirectories under src/proto/schemas, and integrate google's protobuf library by first referencing it and then just copying verbatim
This can be split off and merged in earlier as we focus on getting the core work done in #194.
New proto schemas structure:
[nix-shell:~/Projects/js-polykey/src/proto/schemas]$ tree .
.
├── google
│ └── protobuf
│ ├── any.proto
│ ├── descriptor.proto
│ ├── duration.proto
│ ├── empty.proto
│ ├── field_mask.proto
│ ├── struct.proto
│ ├── timestamp.proto
│ └── wrappers.proto
└── polykey
└── v1
├── agent_service.proto
├── client_service.proto
├── gestalts
│ └── gestalts.proto
├── identities
│ └── identities.proto
├── keys
│ └── keys.proto
├── nodes
│ └── nodes.proto
├── notifications
│ └── notifications.proto
├── permissions
│ └── permissions.proto
├── secrets
│ └── secrets.proto
├── sessions
│ └── sessions.proto
├── test_service.proto
├── utils
│ └── utils.proto
└── vaults
└── vaults.proto
14 directories, 21 files
New versions should go into polykey/v2
. But only that which is changed. Except for the services which must always be copied.
Next iteration of API review should specify the usage of google/protobuf
.
The above should go into the developer documentation.
The current separated types are still quite messy. The shared types being used in each domain should match the actual domain types that are used inside the TypeScript source code. At least the types that we want to transfer over the wire.
After this, we need specific types for XRequest
and YResponse
message wrappers. These are different and unique to the domain protobuf message
structs. Following this standard: https://docs.buf.build/lint/rules#rpc_request_standard_name-rpc_response_standard_name-rpc_request_response_unique
Other struct types should not have Request
and Response
suffix and are used as properties of XRequest
and YResponse
types.
If dealing with streams, they would just be streams of XRequest
and YResponse
types.
The name of each request/response message should follow the RPC request being used for it. This means these types will need to be put into src/proto/schemas/client_service.proto
and src/proto/schemas/agent_service.proto
. Unfortunately this will impact imports across the codebase, so some find and replace will need to be used.
I used a trick Ctrl + Shift + H in vscode to do it.
RFC @tegefaulkes @emmacasolin @scottmmorris @joshuakarp
Doing this will reduce the number of imports like this:
import * as utilsPB from '../../proto/js/polykey/v1/utils/utils_pb';
import * as nodesPB from '../../proto/js/polykey/v1/nodes/nodes_pb';
import * as gestaltsPB from '../../proto/js/polykey/v1/gestalts/gestalts_pb';
import * as permissionsPB from '../../proto/js/polykey/v1/permissions/permissions_pb';
And recover back what we used to do:
import * as clientServicePB from '../../proto/js/polykey/v1/client_service_pb';
// clientServicePB is basically clientPB
So we abstract the lower-level message types to higher-level? i.e. instead of:
// agent_service.proto
rpc NodesChainDataGet (polykey.v1.utils.EmptyMessage) returns (polykey.v1.nodes.ChainData);
we'd have
// agent_service.proto
rpc NodesChainDataGet (polykey.v1.agent_service.NodesChainDataGetRequest) returns (polykey.v1.nodes.NodesChainDataGetResponse);
The only issue I see with this is you'd lose the ability to clearly compare different service functions and see if their request/response type is the same. It also makes it a bit more convoluted to actually construct (need to create an extra message type before being able to do anything).
Having said all of this, interestingly my opinions on this go against the standard from what you linked:
One of the single most important rules to enforce in modern Protobuf development is to have a unique request and response message for every RPC. Separate RPCs should not have their request and response parameters controlled by the same Protobuf message, and if you share a Protobuf message between multiple RPCs, this results in multiple RPCs being affected when fields on this Protobuf message change. Even in simple cases, best practice is to always have a wrapper message for your RPC request and response types.
If this is the case, then probably best to do it.
In terms of your other comment:
The shared types being used in each domain should match the actual domain types that are used inside the TypeScript source code. At least the types that we want to transfer over the wire.
I completely agree with this though.
I think most wrappers may be "thin" wrappers around common types. But the existence of unique wrappers allows extensibility for the future.
Currently our error handling over GRPC should be documented and investigated to see how it would work if mapped to HTTP API.
The src/errors.ts
currently has a ErrorUndefinedBehaviour
that should be moved somewhere else. It is currently used when the exception returned from the GRPC server is unknown. It doesn't make sense to have this at the top level src/errors.ts
.
While reviewing the CLI Authorization Retry Loop MR, we discovered the need to redo our integration of session management into the grpc domain. Discussion is here: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213#note_728972375
This led us to explore the usage of client side and server side interceptors. Atm, grpc-js doesn't support server side interceptors, but does support client side interceptors.
This library https://github.com/deeplay-io/nice-grpc came up again as they have already built on top of grpc-js and implemented client side and server side interceptor/middleware.
It also however does all the automation of grpc into promises or async generators that we did ourselves in grpc/utils.ts
.
While a temporary solution will be used for session management for now, this can be a valid direction for moving ahead with GRPC and thus replacing our own hacks on top of GRPC.
The only issue is our hack on grpc-js which monkey patches the http2 server in GRPCAgent
in order to replace its TLS logic with our certificate verification that uses our root keys. This only affects the client service as the agent service relies on the networking domain to do the TLS logic. We would have to investigate how easy it is to monkey patch the HTTP2 server if using nice-grpc. Otherwise we can lift some code from them and port it over to our system.
Drafting out the ideal session management architecture:
I noticed the usage of:
/**
* Generic Message error
*/
class ErrorGRPCInvalidMessage extends ErrorGRPC {
exitCode: number = 70;
}
Which is used in client/rpcGestalts
:
» ~/Projects/js-polykey/src
♖ ag 'ErrorGRPCInvalidMessage' ⚡(gprcsessions) pts/5 19:05:14
grpc/errors.ts
32:class ErrorGRPCInvalidMessage extends ErrorGRPC {
47: ErrorGRPCInvalidMessage,
client/rpcGestalts.ts
259: throw new errors.ErrorGRPCInvalidMessage(
297: throw new errors.ErrorGRPCInvalidMessage(
340: throw new errors.ErrorGRPCInvalidMessage(
378: throw new errors.ErrorGRPCInvalidMessage(
This is a domain specific error and should come from src/gestalts/errors.ts
.
GRPC errors should only for GRPC specific errors, and these errors are not meant to be thrown by RPC handlers.
I've resolved ErrorUndefinedBehaviour
that is actually legitimate. It is now:
/**
* This is a special error that is only used for absurd situations
* Intended to placate typescript so that unreachable code type checks
* If this is thrown, this means there is a bug in the code
*/
class ErrorPolykeyUndefinedBehaviour extends ErrorPolykey {
description = 'You should never see this error';
exitCode = 70;
}
Will be in the CARL MR https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213
@tegefaulkes @joshuakarp very important!
Further investigation into gRPC internals has been done in the realm of metadata, error handling, our use of serialising exceptions via trailing metadata, and the StatusObject
. Progress log here: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213#note_733965832
The situation is that there is leading and trailing metadata.
Client side can only send leading metadata, we are using this to send the session token.
Server side receives the leading metadata and makes use of it to check the session token.
Server side can then respond with leading metadata or trailing metadata. In this case, we use leading metadata to respond with new session token.
Trailing metadata is currently be used to serialise our exceptions. However trailing metadata may need to be used for other things in the future. To solve this, the error key values that is currently in the trailing metadata should be placed in a encoded JSON under error
key in the metadata. Additionally we should making use of the status
code and details
property in StatusObject
.
For example, the current code is:
/**
* Serializes ErrorPolykey instances into GRPC errors
* Use this on the sending side to send exceptions
* Do not send exceptions to clients you do not trust
*/
function fromError(error: errors.ErrorPolykey): ServerStatusResponse {
const metadata = new grpc.Metadata();
metadata.set('name', error.name);
metadata.set('message', error.message);
metadata.set('data', JSON.stringify(error.data));
return {
metadata,
};
}
This should be changed to something like (pseudo code):
metadata.set('error', error.toJSON());
Note that ErrorPolykey
instances all have toJSON()
method.
This however can only be sent on the client service.
In our agent service, we must not send exceptions over because they may contain sensitive data. This means we should also be making use of the code
and details
.
Note:
export declare type ServerStatusResponse = Partial<StatusObject>;
export interface StatusObject {
code: Status;
details: string;
metadata: Metadata;
}
export declare enum Status {
OK = 0,
CANCELLED = 1,
UNKNOWN = 2,
INVALID_ARGUMENT = 3,
DEADLINE_EXCEEDED = 4,
NOT_FOUND = 5,
ALREADY_EXISTS = 6,
PERMISSION_DENIED = 7,
RESOURCE_EXHAUSTED = 8,
FAILED_PRECONDITION = 9,
ABORTED = 10,
OUT_OF_RANGE = 11,
UNIMPLEMENTED = 12,
INTERNAL = 13,
UNAVAILABLE = 14,
DATA_LOSS = 15,
UNAUTHENTICATED = 16
}
The default is UNKNOWN
if it is not passed in. That's why toError
checks for the UNKNOWN
code.
In the case of agent service, it should be sufficient to only set status
and details
as we are returning an error to an unknown party. We would need to use one of the existing status codes above and details there. Further details could be placed in the error
property of Metadata
. However this cannot just be error.toJSON()
as it contains sensitive information like data
and stack
.
What we can do instead is have a "simplified" JSON output which only contains the name
, description
, message
and exitCode
.
toJSON(): string {
return JSON.stringify({
name: this.name,
description: this.description,
message: this.message,
exitCode: this.exitCode,
data: this.data,
stack: this.stack,
});
}
This would allow the caller on agent service to get name, description, message and exit code but nothing else. Would this be sufficient to return validation errors? It seems not, since we would at least require structure. In such a case, data
is necessary, but these exceptions would have to be carefully constructed and verified not to output any sensitive information. Even the message
must not. To ensure this is the case, we would need to carefully document the usage of message
and data
so that they are not allowed to contain sensitive data for exceptions.
Need to think about how to structure this so there's no mistakes can be made and it is foolproof.
In order to achieve such a thing, some change are need to be made to the grpc/utils.ts
. Refer to these 2 quotes from the progress log.
However the usage of the generators that convert the streams right now has no provision for the metadata. You would have to directly use
call.end
afterwards. And they are callingstream.end
say for examplegeneratorWritable()
. This means we should expose the usage of metadata.This might mean that if you do:
gen.next(null)
which currently ends the generator and stream would have to be changed togen.next(meta);
Thus if a metadata is sent, that means the stream has to end with the metadata. We may therefore enable both types to mean this:gen.next(null | Metadata);
. So this is one change that needs to be done.The location of changes is
generatorWritable
insrc/grpc/utils.ts
where we usestream.end()
whenvW === null
andgeneratorDuplex
insrc/grpc/utils.ts
where we usegW.next(null)
whenvW === null
.An ideal way of receiving metadata on client side can be:
const pCall = client.unary(m); const leadingMeta = await pCall.metadata(); const respMsg = await pCall;
or
const genDuplex = duplexStream(); const leadingMeta = await genDuplex.metadata();
This avoids having to go into the internal property of
call
orstream
and then attaching an event handler.However if we implement this where it only resolves when metadata is received, this can be a problem. Because then the resolution will never occur. Instead, a rejection must occur for such a promise if the response is finished but there's no initial metadata.
It may also be that
metadata
is not a function, but a promise property. This is because the event handlers might need to be added immediately. But I wonder what would happen if the stream is started and the event handler is added afterwards? It's safe to usepCall.metadata
orgenDuplex.metadata
then using it as a function that has to attach the event handlers afterwards.This only solves the problem for leading metadata but not trailing metadata.
If the previous post is correct in that trailing metadata is being used for our exceptions, then we may just have to avoid relying on trailing metadata. But a better option would be to namespace our trailing metadata under a
error
prefix, note that nested data is not allowed in metadata. But you can provide an encoded string or buffer. So if we were to useerror
keyed to a POJO JSON and then decode that for ourtoError
andfromError
, that might free up the trailing metadata for other uses later in the future.
All calls must have deadlines, otherwise we can hang forever.
All GRPC calls support CallOptions
.
Which has:
export interface CallOptions {
deadline?: Deadline;
host?: string;
parent?: ServerUnaryCall<any, any> | ServerReadableStream<any, any> | ServerWritableStream<any, any> | ServerDuplexStream<any, any>;
propagate_flags?: number;
credentials?: CallCredentials;
interceptors?: Interceptor[];
interceptor_providers?: InterceptorProvider[];
}
Where Deadline
is:
export declare type Deadline = Date | number;
This is the same kind of deadline as we have with client timeouts when waiting for ready.
These deadlines are on the client side and ensure we have a deadline for when the call is expected to finish.
This requires some testing as to how it works when applied to streams. Do they apply to the entire stream life time, or to per-messages (like a keep alive timeout) to the stream?
Deadlines are important to prevent us with dangling calls taking up resources when the other side drops.
This only solves deadlines on the client side, but not deadlines on the server side. Especially with streaming we need to have deadlines on server side to prevent long-running useless call streams.
Read this as well: https://grpc.io/blog/deadlines/ It appears that it's possible to know from the server side to see if the client has exceeded the deadline and then decide to cancel the call.
The server side can use call.getDeadline()
to see what the deadline is that the client specified. However this doesn't "set" a deadline for the server side call.
Consider the problem of the client stream hanging around and not ending the stream, one must then eventually timeout here and close the connection. This may be something we should do in an server side interceptor (which currently doesn't exist).
Work on deadlines will also relate to #243, notably with a bidirectional stream (note that for node claims, it's strictly between agents).
Regarding deadlines, it's possible to do it globally with a interceptor. You just add the deadline to the call options on every call. Of course this may impact long-running streams. It is also possible to filter by calls when using the interceptor. The docs on client interceptors goes into more detail about this https://github.com/grpc/proposal/blob/master/L5-node-client-interceptors.md and also the docs in https://grpc.github.io/grpc/node/module-src_client_interceptors.html. I used these resources to code up the session interceptor.
For long-running streams, we may want to do something where we don't want to put a deadline on it unless the stream is not active. A sort of keep-alive timeout instead of an overall call deadline.
But this is going to require a case by case analysis. It's something we need to spec out in detail.
Deadlines regarding node connection timeouts had some discussion here: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/213#note_739641342. @joshuakarp please incorporate that into your nodes work too.
Are deadlines necessary for NodeConnection
timeouts here? I was under the impression that this was what the GRPCClientAgent.timeout
was for. Doesn't a deadline just add another "timeout" on top of this?
Timeouts from Node connection should propagate for the GRPCClientAgent. So any timeout specified in NodeConnection
should directly be passed into the construction of GRPCClientAgent
.
In the future when IPV6 is available, usages of 0.0.0.0
defaults should be changed to dual stack specifier ::
as default as it binds to all hosts for IPv4 and IPv6. This is primarily useful for public bindings.
For localhost, ::1
is used for IPv6 while 127.0.0.1
is still IPv4. I believe we stick to 127.0.0.1
for localhost binding instead of ::1
. Mainly because if we're using localhost it's more common to use IPv4. Later this may become dual stack anyway.
An issue came up in https://github.com/MatrixAI/js-polykey/pull/278#issuecomment-979730714 where we were throwing errors when the provider id was not one of the providers in our list.
There's many ways to do this:
undefined
return value to indicate the idea that something doesn't exist, we don't always want to throw an exceptionBecause RPC is meant to be network calls. And in our normal functions we are returning undefined for when something doesn't exist. It makes sense to have our RPC call to also return a type which could be something that doesn't exist. The way that GRPC does this is through the oneOf
specification. https://developers.google.com/protocol-buffers/docs/proto3#oneof
This would be similar to how in TS, we are using x | undefined
as the return type.
There are number of "sentinel values" that we can use to indicate that something doesn't exist:
undefined
for singular items - this gets translated to oneOf
which would require us to provide an empty message of some sort[]
for empty arrays - this is translated to a stream that just ends with nothing, make sure that our async iteration also ends with nothingOf course in other cases we should throw an exception when something that is expected to exist doesn't exist. The design of this requires a case by case analysis until we can form a general guideline.
Due to our asynchronous GRPC client interceptor, adding deadlines can cause problems in our GRPC calls. Upstream GRPC is fixing this to making sure that the asynchronous ops are actually awaited for by awaiting the starting outbound interceptor. Details are here: https://github.com/MatrixAI/js-polykey/pull/296#issuecomment-989580750
This means right now, our deadlines have to be at least 1 second long... but even this is not a robust solution. Best case scenario is to get the upstream fix, and upgrade our GRPC version. Lots of changes though, so it will require some robust testing to ensure that our existing GRPC usage is still working.
Based on my testing in #296, I found out that when the deadline is exceeded, we end up with a local status error called GRPC_STATUS_DEADLINE_EXCEEDED
. This is received by onReceiveStatus
in the client interceptor.
This can occur before the server side even has a chance to respond with leading metadata. This results in an exception being thrown in the GRPC call, which is of course caught by our root exception handler. Note that this has not been encapsulated into our GRPCClient
and therefore isn't part of our grpcErrors
hierarchy yet. This is something we would want to incorporate into our application properly as deadline failures may be caught by our nodes domain in order to initiate retries. Should be part of over design overhaul in #225.
Until the upstream fixes asynchronous interceptors, it's possible for concurrent GRPC calls to run without refreshing the token.
await pkClient.grpcClient.doX();
await pkClient.grpcClient.doY();
// at this point only doX and doY interceptors will finish
// unless there is more work that is scheduled ahead in the event loop
Not a huge issue right now due to the use of FlowCountInterceptor
, but can be dealt with later.
Details here: https://github.com/MatrixAI/js-polykey/pull/296#issuecomment-989740159
Coming from fixing the pk identities authenticate
call: https://github.com/MatrixAI/js-polykey/pull/278#issuecomment-996403113
There are 3 possible "timeout" events that our GRPC protocol needs to handle:
let genReadable: ReturnType<typeof pkClient.grpcClient.identitiesAuthenticate>;
this.exitHandlers.handlers.push(async () => {
if (genReadable != null) genReadable.stream.cancel();
if (pkClient != null) await pkClient.stop();
});
Right now none of these 3 situations are being handled. This means our GRPC server can be deadlocked or experience resource starvation. Really without a way to abort asynchronous operations, we also have resource leaks.
Example of this is:
identitiesAuthenticate
call will wait forever on sending the auth process message back to the client even though the client has cancelled the server streamIssues that is relevant to this are:
I believe events 1 and events 2 are the same to the server side. The server side sees both cancellation events. This is due to: https://grpc.io/blog/deadlines/#checking-deadlines
Which should mean when event is cancelled, the onReceiveStatus
should also be called.
As a side note about Protobuf Map.
https://github.com/MatrixAI/js-polykey/pull/278#issuecomment-996395664
The map used in protobuf is not a ES6 map. It's google-protobuf library's own map. The documentation is here:
To convert it to a POJO:
// this is the closest way to do it, the X is whatever the field name is Object.fromEntries(message.getXMap().entries());
We have reviewed the gRPC API, and worked with it extensively. It's time to work out a better RPC layer for PK. There are several problems to to consider here:
237
240
src/client
authenticator
218
166
235
200
1.0.0 -> 1.1.0 -> 2.0.0 -> 2.0.1
1 -> 2 -> 3
1 -> 2 -> 3
We have 2 main proto files:
proto/schemas/Client.proto
proto/schemas/Agent.proto
And a
Test.proto
as well, this will need to be used to generate the marshaling code.Additional context
155 - compatibility with Mobile operating systems
166 - having a transport agnostic RPC will make it easier for third party integration as it's possible to extend the different transport options to communicate with the PK agent - such as TCP, and UDP... etc
235 - would not be a problem anymore if a web-based transport is enabled, either that or if electron directly bridges into the nodejs runtime (rather than having the FE call into a BE proxy) - this is also relevant to any browser-extensions
248 - some RPC functionality is intended to support asynchronous API, should consider these design requirements for notifications
234 - our RPC should be compatible on the transport layer for P2P communication, P2P hole punching occurs underneath the RPC layer, so the P2P side has to bootstrap from a lower level communication protocol which is purely message oriented
400 - the RPC should be agnostic to IPv4 or IPv6
297 - RPC mechanism must be cancellable ideally using cancellable promises, which includes cancelling the underlying side effect
243 - it's a good idea for the RPC to have timeouts on its calls and to handle such timeouts, in particular we would want to be able to create "custom" protocols on top of any streams, or better would be to "lift" such protocols into the the underlying RPC system such as the nodes claiming process
Tasks
279
Client.proto
andAgent.proto
with version names and test out multiple-version services, and find out what the client does when the version wanted is not availableMAX_CONCURRENT_CONNECTIONS
is used1.4.1
version has some incompatibility:TypeError: http2Server.on is not a function
.