Closed colin-grant-work closed 2 years ago
Likely a symptom of similar issues:
DebugSessionManager
set currentSession
:
console.log('SENTINEL FOR SETTING THE CURRENT SESSION', current);
@tortmayr, any plans to tackle this one? It seems that logging almost any non-primitive object produces the Cannot read properties of undefined (reading 'length')
, and it isn't hard to get the maximum stack frame error for frequent logs with complex arguments (e.g. an isVisible
check for a command where a widget will be the argument.)
@colin-grant-work I'm looking into this right now. At first glance it looks like we have to tackle two issues here:
About the circular structures: we should be able to just keep a stack of objects being serialized (IMO). Max depth would be the depth of the object tree. Am I correct to assume that circular structures were rejected as an error even before?
Am I correct to assume that circular structures were rejected as an error even before?
Partly. I did some testing on 53a272e113890174c8b3049b8d1a74999dee0cd4. When trying to sent a non-serializable object over the wire (e.g. ciruclar structure) JSON.stringify inside the WebSocketMessageWriter throws an error which is catched immediately. This means the log RPC call is never sent to the "other" side (i.e. the backend) but the promise is not rejected either. It just remains unresolved. Since the console.log API is synchronous anyways nobody is awaiting promise resolution and it doesn't matter that these promises remain unresolved.
So to sum up, frontend console.log calls with non-serializable arguments had the following behavior with the old protocol (53a272e113890174c8b3049b8d1a74999dee0cd4):
Invoke proxy call to backend LoggerServer
. https://github.com/eclipse-theia/theia/blob/53a272e113890174c8b3049b8d1a74999dee0cd4/packages/core/src/browser/logger-frontend-module.ts#L42
Whereas with the new protocol
ConsoleLogger
LoggerServer
.
IMO the if the log RPC call fails the promise should definitely. We could implemented some rejection handling to achieve the same behavior as before.
However, this also raises the question why we sent log calls as requests in the first place. Would it make more sense to send log RPC calls as plain notifications? This would also reduce the overall load on the websocket connection because we don't have to send back unnecessary void request responses.
About the circular structures: we should be able to just keep a stack of objects being serialized (IMO). Max depth would be the depth of the object tree.
I'm not sure if I follow. Could you elaborate on this? How would one be able to identify the depth of the object tree in a circular structure?
Might we be able to borrow from Node's object serialization routine? The Node REPL and console
are able to handle circular objects gracefully:
> const a = {};
undefined
> const b = {};
undefined
> a.b = b;
{}
> b.a = a;
<ref *1> { b: { a: [Circular *1] } }
> b
<ref *1> { a: { b: [Circular *1] } }
I think the ref identification is relatively new; older versions just had [Circular]
, which would be fine with me. Re: tracking circularity, a Set
would do the job, since Set
s can be keyed on objects and could be used to identify any object we've previously seen, though it could be a decent bit of memory overhead during the stringification process. Might even be able to be passed as an argument to JSON.stringify
const a = {};
// --> undefined
const b = {};
// --> undefined
b.a = a;
// --> {}
a.b = b;
// --> {a : {…}}
const c = new Set();
// --> undefined
const replacer = (key, value) => {
if (c.has(value)) { return '[Circular]'; }
if (typeof value === 'object' && value) {
c.add(value);
}
return value;
}
// --> undefined
JSON.stringify(a, replacer);
// --> '{"b":{"a":"[Circular]"}}'
About the circular structures: we should be able to just keep a stack of objects being serialized (IMO). Max depth would be the depth of the object tree.
I'm not sure if I follow. Could you elaborate on this? How would one be able to identify the depth of the object tree in a circular structure?
There are really only two cases: either the graph of object references we serialize has a loop or it does not. If it does no, everything is "fine": even if we use the same object twice in the graph being serialized, we can successfully compute serialized version. We will just end up with multiple copies of the object. If there is a circle in the object graph, we can detect it in the following way. When we start serializing an object, we check whether the object is already in the stack. If it is, we know that there is a loop in the graph and we can throw a "circular structure" error. Otherwise we push the object onto the stack. Once we're done serializing the object, we pop the object from the stack.
@colin-grant-work it would be nice to be able to serialize circular structures. I would suggest, however, that we do this as a follow-up improvement. My thinking is that we're trying to restore the behavior as it was before moving to the new serialization as soon as possible. That said, the memory overhead can be calculated in a general fashion: first we can observe that the upper bound of objects we need to keep is the number of objects in the graph we're serializing times the number of bytes needed for remembering the object. I don't know how much that is, exactly, but I would upper-bound the number to a small multiple of what an object itself with a single reference would use. And that is the worst case, since most object graphs will contain primitive values. So I don't think memory consumption will be a problem. The memory consumption also depends on whether we want to conserve object identity or simply be able to serialize circular structures: if we just want to serialize circles, remembering the objects currently being serialized in a stack is enough: we will end up with copies in the case of object reuse that does not lead to circles. Note that this will not improve worst case memory use: the worst case would be a linked list of objects: the depth of any path in that graph is the number of nodes 🤷 .
About the circular structures: we should be able to just keep a stack of objects being serialized (IMO). Max depth would be the depth of the object tree.
I'm not sure if I follow. Could you elaborate on this? How would one be able to identify the depth of the object tree in a circular structure?
There are really only two cases: either the graph of object references we serialize has a loop or it does not. If it does no, everything is "fine": even if we use the same object twice in the graph being serialized, we can successfully compute serialized version. We will just end up with multiple copies of the object. If there is a circle in the object graph, we can detect it in the following way. When we start serializing an object, we check whether the object is already in the stack. If it is, we know that there is a loop in the graph and we can throw a "circular structure" error. Otherwise we push the object onto the stack. Once we're done serializing the object, we pop the object from the stack.
I see. Thanks for the explanation.
@colin-grant-work it would be nice to be able to serialize circular structures. I would suggest, however, that we do this as a follow-up improvement. My thinking is that we're trying to restore the behavior as it was before moving to the new serialization as soon as possible. That said, the memory overhead can be calculated in a general fashion: first we can observe that the upper bound of objects we need to keep is the number of objects in the graph we're serializing times the number of bytes needed for remembering the object. I don't know how much that is, exactly, but I would upper-bound the number to a small multiple of what an object itself with a single reference would use. And that is the worst case, since most object graphs will contain primitive values. So I don't think memory consumption will be a problem. The memory consumption also depends on whether we want to conserve object identity or simply be able to serialize circular structures: if we just want to serialize circles, remembering the objects currently being serialized in a stack is enough: we will end up with copies in the case of object reuse that does not lead to circles. Note that this will not improve worst case memory use: the worst case would be a linked list of objects: the depth of any path in that graph is the number of nodes shrug .
I agree, we should aim at restoring the original behavior as soon as possible and handle potential serialization of circular structures in a follow up PR. Maybe we can do this as part of https://github.com/eclipse-theia/theia/issues/11159. Part of this task is also to evaluate whether we can/should replace the custom binary codec with a third party library (https://msgpack.org/index.html). Some msgpack implementations are also able to fully serialize ciruclar structures. I have recently played around with msgpackr. Here the author claims that enabling the support for circular structrures will decrease the overall performance by 25-30 %. So I'd exepct that we will encounter similar performance degrations if we implement circular reference support in our custom binary codec.
Part of this task is also to evaluate whether we can/should replace the custom binary codec with a third party library (https://msgpack.org/index.html)
@tortmayr I would be greatly in favor of this. I'm not too sure whether msgpackr
supports it as well, but the original msgpack
library allows to send binary buffers over the wire. This would remove the need to compress request-service buffers into strings before sending them to the frontend, heavily reducing complexity of this feature:
@msujew not quite sure how the issues are related: whether we need to string-encode buffers depends on how we interpret request data (binary or text), not on the particulars of the encoding engine. What am I missing?
@tsmaeder Well, yes and no. Even when encoding the buffer in binary, we usually lose the type information, which leaves the buffer
object as a simple number array. It's better than using a text encoding, as it will not bloat the payload, but some methods only accept buffers, not number arrays, making it hard to use without additional post-processing. msgpack
should do this post-processing automatically.
Also, why reinvent the wheel if there already is a perfectly usable lib for our usecase?
I still don't get it: by "typing" you mean whether it's a uint8array
or a buffer
, for example? That can be easily taken care of by registering a custom encoder. I agree that using a library can be attractive if it addresses our needs and we have confidence in the creators.
@tsmaeder You're probably right, using a custom encoder would be the right approach for this.
I have opened a PR that should fix the issue and restores the behavior of the "old" protocol in regards to logging circular structures. As discussed any other improvements i.e. the ability to fully serialized circular structures will be tackled in a follow-up as part of https://github.com/eclipse-theia/theia/issues/11159.
Bug Description:
Starting the application in the browser, I got stuck at the loading screen for 2-3 minutes with numerous errors logged in various parts of the new IPC system. (@tortmayr, @JonasHelming)
Steps to Reproduce:
console.log
that produces numerous rapid logs (ca. 100 or so in a tick), possibly involving non-stringifiable types (Map
, e.g.).Additional Information