Auto-recover connection upon network connectivity issue

dudicp commented 2 years ago

Hi,

I would like to make my CDP client resilient to network connectivity issues and I'm wondering what's the best way to achieve it. It seems the disconnect callback is invoked when the target is closed or when the websocket disconnected, however the disconnection reason is not available and there is no way to know when should the auto-recovery need to retry.

Trying to reconnect to non-exists target-id causes the following error:

TypeError: Cannot read properties of undefined (reading 'webSocketDebuggerUrl')
        at Chrome._fetchDebuggerURL (/Users/dudip/source/rocket-engine/server/node_modules/chrome-remote-interface/lib/chrome.js:183:31)
        at processTicksAndRejections (node:internal/process/task_queues:96:5)
        at async Chrome._start (/Users/dudip/source/rocket-engine/server/node_modules/chrome-remote-interface/lib/chrome.js:140:25)

Is it possible to add the disconnect reason as argument to the disconnect callback?
Alternatively, throw specific target not found error upon connecting to non-exists target-id.

cyrus-and commented 2 years ago

and there is no way to know when should the auto-recovery need to retry.

What do you mean by that? As soon as you're being disconnected you can try again, no?

Alternatively, throw specific target not found error upon connecting to non-exists target-id.

Why is it important the error being thrown? Also, why are you using non-existing target ids?

dudicp commented 2 years ago

What do you mean by that? As soon as you're being disconnected you can try again, no?

I can try reconnect, but if the disconnect reason was because the target closed, the reconnect attempts will always fail because there isn't such target-id anymore. I would like not to try reconnect if the target not exists anymore and the disconnect callback missing this information.

Why is it important the error being thrown? Also, why are you using non-existing target ids?

I'm not using non-existing target ids, one of the situation the disconnect callback is being triggered is when the target got closed and when I try reconnect to a target that got closed, I got the error above. So the alternative approach to understand when to stop reconnecting in the scenario of target that got closed, is to get the specific target not found and I would be able to stop reconnecting.

cyrus-and commented 2 years ago

IMHO after any disconnection (I can only think of Chrome or a tab being closed) the target id is no more valid. You shouldn't rely on that, rather every time rerun the code you're using to find/create the target you're going to interact with.

dudicp commented 2 years ago

The idea was to recover a connection to a specific target-id in case the websocket experienced network issue, is there a way I can query if the target-id still valid? If so, I can check it before attempting to reconnect.

cyrus-and commented 2 years ago

You can obtain the list of all the available targets with CDP.List().

dudicp commented 2 years ago

Yes, this is what I've implemented today but it is not optimal, because in my use-case I have a lot of opened targets and on each disconnection I need to iterate on the entire list.

cyrus-and commented 2 years ago

I'm not sure I 100% understand your issue after all... I'm not fully aware of your setup but let's say that there are two kinds of problems that can happen:

the connectivity between the node that run the script and the one that hosts the Chrome instance breaks;
Chrome just crashes (or target closed) for whatever reason or its node gets torn down.

In the former case you can just reconnect and use the same target IDs, in the latter you need to set up everything again, from scratch.

I can try reconnect, but if the disconnect reason was because the target closed, the reconnect attempts will always fail because there isn't such target-id anymore.

Why not just attempt the connection? See the following (simplified example):

const CDP = require('chrome-remote-interface');

(async () => {
    try {
        const target = 'whatever ID from before';
        const client = await CDP({target});
        console.log('TODO resume your work here...');
    } catch (err) {
        console.log('cannot connect to target, starting anew');

        try {
            const {webSocketDebuggerUrl: browserTarget} = await CDP.Version();
            const browserClient = await CDP({
                target: browserTarget
            });
            const {targetId} = await browserClient.Target.createTarget({
                url: 'https://example.com'
            });
            console.log(`TODO do stuff with the new target ${targetId}`);
        } catch (err) {
            console.log('Chrome is definitely gone');
        }
    }
})();

dudicp commented 2 years ago

I'm trying to deal with #1 - the connectivity between the node that run the script and the one that hosts the Chrome instance breaks;

When the connection breaks, the 'disconnect' event is being triggered: cdpClient.on("disconnect", disconnectCallback);

Now, once the callback is called, I would like to perform auto-reconnect logic (loop until reconnect successfully or timeout). But if the reason to the disconnection is because the tab was closed, the reconnect will fail and because I don't have a way to differentiated between the root cause of the disconnection, It will retry reconnecting until reaching timeout.

To solve it, I've used CDP.List() and check if the target is still available before attempt to reconnect, but as I've mentioned above it's not optimal.

If there was a specific error (like TargetNotFound) was thrown when CDP({target}) invoked, I can use it to break the loop instead of querying all target list.

cyrus-and commented 2 years ago

To solve it, I've used CDP.List() and check if the target is still available before attempt to reconnect, but as I've mentioned above it's not optimal.

If there was a specific error (like TargetNotFound) was thrown when CDP({target}) invoked, I can use it to break the loop instead of querying all target list.

Note though, that said loop is performed anyway under the hood by the library when you pass a target id instead of a WebSocket URL:

https://github.com/cyrus-and/chrome-remote-interface/blob/cbd690f7a905f8789e00fa500546355f4c9cd77c/lib/chrome.js#L181-L183

This is because obtaining the WebSocket URL from a target id is something not documented AFAIK, and that most importantly, changes across implementations. If you know your implementation (and you do), you can directly attempt a connection to:

ws://${host}:${port}/devtools/page/${targetId}

cyrus-and / chrome-remote-interface

Auto-recover connection upon network connectivity issue #506