Closed tegefaulkes closed 1 week ago
This is going to be a pretty large change. Converting everything to TimedCancellable
is pretty simple. there is just a lot of things to change and cover.
TimedCancellable
already but needs to be reviewed again.I don't think the default timeout for the timedCancellable
decorator can be set at any time. It has to be defined at import time. This means we can't configure a default time when starting the NodeConnectionManager
. We also don't want to have hard coded timeouts everywhere so we may have to define the defaults in the config file.
There are several places where timedCancellable
and variants of this should be used. Basically anywhere where there is non-deterministic side-effects.
Pretty much all of them concern around network usage.
Promise.race
work arounds until https://github.com/MatrixAI/js-async-locks/issues/21For long running computations, we're talking about generators and related iterators. In these cases, the cancellation of generators hasn't been entirely worked out yet, and instead it is expected that you would stop consuming the generator (if you don't ask for the next item, it would continue computing). The only case where cancellation applies is when you do ask for the next item, but you must await for it, in which case you'd like to have the ability to do p.cancel()
. However atm, we don't have this ability yet, so instead, you may need to pass an overall signal to the generator function and for it to signal cancellation the traditional way.
Remember similar to async f
functions, I believe async f*
forces them to be regular promises and not our extended promises. But I don't remember if I confirmed this fact.
As for the network, pretty much all network operations should be cancellable or timed cancellable if there's a timeout. This requires a big major refactor of the Proxy
code to support this. There are multiple "default" timeouts currently configured during class construction. If we use the decorators, the defaults has to be set statically and cannot take in instance parameters.
Alternatively you can also not use the decorators, and just directly use PromiseCancellable
while taking in the signal. That would allow you to take the instance/constructor parameters as the defaults for the Proxy
.
So we have to decide whether to use static parameters and the decorators, or whether to swap to just using the PromiseCancellable
and taking in the signal directly.
It's important to understand that the decorators are for convenience, but they are not infinitely flexible, it trades off some flexibility for some better convenience.
I don't think the default timeout for the
timedCancellable
decorator can be set at any time. It has to be defined at import time. This means we can't configure a default time when starting theNodeConnectionManager
. We also don't want to have hard coded timeouts everywhere so we may have to define the defaults in the config file.
Yes, because if decorators are static, then the parameters they take are static too. This is of course only necessary for "default" deadlines. Whatever is passed into the method/function will override it. If we do use static deadlines, then it does make sense to propagate these static parameters directly from the config object.
I discussed that you can also use signals and promise cancellable directly in the method to avoid static parameterisation...
I suggest we do a priority here.
network
domain, we should refactor this to support timers and cancellability (this may mean we can remove the existing timer utility)Also additionally there's the identities
domain, which also has network side effects from contacting identity providers, these should also start using timed cancellable too.
I'm thinking there's sort of like 3 levels of defaults here:
src/config.ts
I'm not sure if 2
is possible if we use the decorators, unless we change them up to look up parameters on the instance if the method parameter isn't passed in (that is is set to undefined
). But we would need to pass in a string
to indicate the name of the parameter to look up.
I think it is possible to do all 3.
This is the function head:
function timedCancellable(
lazy: boolean = false,
delay: number = Infinity,
errorTimeoutConstructor: new () => Error = contextsErrors.ErrorContextsTimedTimeOut,
)
It's possible to do something like:
function timedCancellable(
lazy: boolean | string | symbol = false,
delay: number | string | symbol = Infinity,
errorTimeoutConstructor: new () => Error = contextsErrors.ErrorContextsTimedTimeOut,
)
And then do timedCancellable('lazyProp', delaySym)
. And later use the this[lazy]
and this[delay]
to acquire the parameters.
// If it was `undefined`, it would be set to `false`
if (typeof lazy !== 'boolean') {
lazy = this[lazy];
}
// If it was `undefined`, it would be set to `Infinity`
if (typeof delay !== 'number') {
delay = this[delay];
}
However this pattern may not be scalable to other decorators which have actual string/symbol parameters...
So it may be better to actually have a separate set of parameters to refer to the instance.
function timedCancellable(
lazy?: boolean,
delay?: number,
errorTimeoutConstructor: new () => Error = contextsErrors.ErrorContextsTimedTimeOut,
defaults?: {
lazy: 'lazyProp';
delay: 'delayProp';
}
)
But I find this syntax way too verbose...
We could also use a more different type like arrow functions.
function timedCancellable(
lazy: boolean | (this: any) => boolean = false,
delay: number | (this: any) => number = Infinity,
errorTimeoutConstructor: new () => Error = contextsErrors.ErrorContextsTimedTimeOut,
)
It'd be unlikely for a decorator parameter to be an arrow function, but if they needed to differentiate, they could also check the number of parameters... etc.
Notice the this
is literally any
here.
Using it would be like:
class X {
lazyProp = true;
@timedCancellable((this) => this.lazyProp)
method() {
}
}
The last style is likely the more flexible.
I've added related issues #243 and #242. They should be addressed together here.
If there are going to be multiple PRs here, this should be made an epic and with subissues and subPRs.
Full work on this might take 1 sprint and a bit.
The todos relating to cancellability should also be done in this issue. This issue is an epic so there are multiple subissues. @tegefaulkes you can add more issues into this epic as you see fit.
I think we can split up the work in order of:
It might be better to address the locking along side the other issues when I encounter any locks.
@CMCDragonkai pointed out that we want to reduce the punch packet interval. We can set it to something like 100 or 50 ms.
We can also send a whole bunch of packets at once when trying to connect. In my experience however if you commonucate too much with a closed port some firewalls will start to ignore you. With a good connection you only need to send 2 packets each way within a second to establish a connection through a NAT. For connections with packet drop you;ll need to send more to make up for the drop but you don't need a crazy amount.
I thought I attached more issues to this epic earlier, I need to fix that.
Need a new sub issue for Identity manager connections. This is also needed for discovery.
As discussed before, the creation of a NodeConnection
with createNodeConnection
doesn't convey the timeout to the proxy connection. The proxy connection is started automatically when it receives a connection from the NodeConnection
. This means we can end up with an un-composed proxy connection if the NodeConnection
creation times out.
We can fix this by having the NodeConnection
start the ForwardConnection
directly with the timeout. Then it can connect to the proxy and compose like normal.
This is not a critical change, currently if a uncomposed forward connection exists it will be cleaned up after a TTL period of no activity.
Let's keep it the way it is... the proxy's internal connection logic is somewhat separated by the Node Connection and GRPC connection. This is due to how we are using HTTP connect protocol to connect these 2 disparate parts.
When we get around to working on QUIC or WebRTC #234 and replacing GRPC with jsonrpc in #249, then we won't need any workarounds, everything should just be tightly integrated.
So if it can still work with the TTLs, let's just continue doing this.
It does make sense that it's possible to create proxy connections without there being a corresponding node connection. That's the main issue here. And I think we can continue with this design.
Created issues #477 and #476 for discovery and identities.
I'd want to introduce the ability to set { timer: 1000 }
or some number parameter for convenience purposes.
So this means when we move to JSON RPC, we can add another parameter to our params
object, which is the timeout
.
If you don't set timeout
, then the handler may default to something else, or my default to infinite time.
We can have these 3 meanings:
timeout: undefined; // default
timeout: null; // infinite
timeout: 1000; // 1000ms
It might be good idea to have timeout: null;
available. Maybe only a select few calls can accept this. Because it is ripe for abuse.
I think given that we should only allow for the default timeout or a specified timeout that is less than the default. Any values larger than the default can be clipped to the default value. Or if we want to avoid the implicit behaviour we can throw an error for a bad value.
A slight problem with this is that we currently don't enforce any timeouts in the RPC system. And the only way to convey timeouts is through the metadata which the RPC system doesn't care about or even perceive as it's part of the middleware or user implementation.
unless we add timeout parameter to the message structure then it's up to the middleware to handle the timeout logic. But currently the middle ware can't affect any timeouts or handler cancellation. It's meant to short circuit and reject the current stream immediately.
If we want to allow the middle ware to affect cancellation then we need to pass it an abort controller and have it handle it's own abortion logic. That way we can create timeout or cancellation logic inside of the middleware. while we're at it we may as well provide the middleware with other contextual information such as the ConnectionInfo
.
I think it depends on a few things.
The timeout is then a proxy for the amount of work that should be accepted by the server, and the expected deadline by the agent to complete.
The timeout should be negotiated but then clients can ask for a shorter timeout or for a longer timeout but only up the maximum timeout.
How do timeouts work for streams? Is it the timeout for the entire stream or only for a single message of the stream? How does grpc implement deadlines for their streams? I would prefer timeouts per message because it measures liveness and can be used for infinite streaming if that's what we are trying to implement.
Therefore each rpc method should have a max timeout on the server side.
The caller if not specifying a timeout can default to max. But can specify a lower timeout.
I don't think the middleware should be dealing with timeouts.
The timeouts should be part of the message params just like other metadata.
The timeout info must be passed into the subsequent methods. There are some methods that will cost time but won't take a timeout. These are synchronous methods...
For all other async methods a timeout can be used to cancel any operation.
So technically this means the timeout isn't 100 percent accurate since it won't cancel sync operations on the server side.
On the client side though, the timeout would just cancel the call and ignore any data. The only way to deal with this is an exception on the server side which could cancel sync operations.
The deadline mechanism works in the JSON RPC system now. It's just a matter of doing agent migration. But also ensuring that the subissues here are completed.
Applying it to sigchain depends on agent migration.
Applying it to identity management only requires bridging it to the underlying IO which can be just the fetch
function provided by cross-fetch
.
This issue is very old and predates the network stack overhaul. I'm going to close it as not planned.
This is still required. It's independent of networking. It's basically any kind of operation that is meant to be timed and cancellable. It's a general pattern. Of course new issues should be more specific.
Specification
With the application of
timedCancellable
to functions within the nodes domain. We have encountered new connection errors randomly during tests. These error started after applying a deadline of 1500 ms topingNode
usage. We suspect the problem is coming fromopenConnectionForward
after setting such a short deadline. These may be bugs that existed previously that was never revealed by the current testing. Currently there is 1NodeConnectionManager
test where we override the deadline foropenConnectionForward
.The occur randomly and are as follows. the can be re-created by running all of the nodes domain tests concurrently.
Additional context
Tasks