apify / browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
87 stars 14 forks source link

feat: session-based fingerprint cache #77

Closed barjin closed 2 years ago

barjin commented 2 years ago

backport of apify/apify-ts#138

For compatibility reasons, the options are not renamed, only the cache logic is updated.

closes #64 #67

B4nan commented 2 years ago

Those windows tests, so flaky all the time 😭

szmarczak commented 2 years ago

@barjin Doesn't your approach assume a single session per browser? If I read the code right, the hook is executed only once [per browser launch]. That means for all incognito pages it's using the same fingerprint. What am I missing?

barjin commented 2 years ago

You aren't missing anything, that's how it works indeed :)

AFAIK, unfortunately, the current implementation of the incognito pages/fingerprinting doesn't allow us to change the fingerprint on a previously injected BrowserContext (hence, this line).

szmarczak commented 2 years ago

FYI there's undocumented BrowserContext._resetForReuse.

Edit 2: However it won't work with the Tab as a Container extension I'm working on, since it uses persistent contexts (required for caching). So we still cannot do anything without target interception :( I might figure something out but no promises.

await browser.exposeFunction('lol', () => {
    return 123;
});

await browser.addInitScript(async () => {
    console.log(await lol());
});

await browser._resetForReuse();

await browser.exposeFunction('lol', () => {
    return 456;
});

await browser.addInitScript(async () => {
    console.log(await lol());
});

If that's too unstable, I think we can go with page.addInitScript instead? That should enable us to do this on a ~~tab page level.~~

szmarczak commented 2 years ago

We can set viewport as well: page.setViewportSize(). Not sure about the user agent though. Might be possible with setExtraHTTPHeaders but needs to be checked I'd say it's 50/50.

Edit: However page is not the entire context (nor a tab), so it probably won't work right. I'll open a Playwright issue about stabilizing _resetForReuse when I wake up.