apify / browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
87 stars 14 forks source link

feat: tab as a container #80

Closed szmarczak closed 2 years ago

szmarczak commented 2 years ago

Not sure why CI fails, I guess unrelated breaking playwright change...

'use strict';
const playwright = require('playwright');
const { PlaywrightPlugin } = require('../browser-pool/dist/index');

const plugin = new PlaywrightPlugin(playwright.chromium);

(async () => {
    const launchContext = plugin.createLaunchContext({
        experimentalContainers: true,
        launchOptions: {
            headless: false,
        },
    });

    const browser = await plugin.launch(launchContext);

    const controller = plugin.createController();

    controller.assignBrowser(browser, launchContext);
    controller.activate();

    const pages = await Promise.all([
        controller.newPage(),
        controller.newPage(),
    ]);

    await controller.setCookies(pages[0], [
        {
            name: 'foo',
            value: 'bar',
            domain: 'httpbin.org',
            expires: Math.floor(Date.now() / 1000) + 3600,
            secure: true,
            httpOnly: false,
            path: '/',
            sameSite: 'Strict',
        }
    ]);

    // console.log(
    //  await controller.getCookies(pages[0]),
    //  await controller.getCookies(pages[1]),
    // );

    const responses = await Promise.all([
        pages[0].goto('https://httpbin.org/anything'),
        pages[1].goto('https://httpbin.org/anything'),
    ]);

    const jsons = await Promise.all(responses.map(response => response.json()));

    console.log(jsons.map(response => response.headers.Cookie));

    await controller.close();
})();

Result:

[ 'foo=bar', undefined ]

It works! Headless as well! Draft because I need to put the above into a Jest test yet.

szmarczak commented 2 years ago

This currently uses Manifest V2. Manifest V2 is scheduled to be turned off in January 2023. For enterprise June 2023.

Here's why Manifest V3 doesn't work [yet]

I tried using Manifest V3 - that means the cookie headers need be modified via CDP (not using Playwright because it turns off caching). That works BUT unfortunately due to https://crbug.com/1292450 and https://crbug.com/1200844 it's not possible to modify the cookie headers. Instead, I removed the cookies and set them manually. So, at the point of sending request, there were no cookies for that particular website in the browser - this way Chromium accepted modified cookie headers. Unfortunately for some reason Gmail (it's a good test) doesn't like that. I tried shifting cookies instead (delete other cookies -> rename cookies for the current session -> send request -> restore old cookies), but that is buggy as hell - I could write more details about this but it has so many bugs that I'd say those cookies were living their own life :P

tl;dr: Manifest V2 it works like a charm. Let's hope those two issues get fixed in 6 months.

B4nan commented 2 years ago

Btw I dont think we want to merge this here, should go to the crawlee monorepo instead.

(we can, but generally this repo will be deprecated, and @crawlee/browser-pool will be the place where future development will happen)

szmarczak commented 2 years ago

Is SDK v2 gonna use the crawlee one as well?

B4nan commented 2 years ago

Nope, we wont be maintaining v2 after crawlee is out, maybe some critical fixes, but not features.

(it would be also BC, as crawlee packages require node 16, while SDK v2 supports node 15 too)

szmarczak commented 2 years ago

Makes sense. Thanks for clarifying. I'll open a PR in the crawlee monorepo then.