YusukeIwaki / puppeteer-ruby

A Ruby port of Puppeteer
Apache License 2.0
290 stars 41 forks source link

`puppeteer-ruby` freezes with multiple threads. #340

Open ronakjain90 opened 3 weeks ago

ronakjain90 commented 3 weeks ago

Step To Reproduce / Observed behavior

Hi @YusukeIwaki - Thanks for this open source project, we are currently using this in production and I have noticed the following issue.

When running in parallel threads, puppeteer-ruby (using chrome/firefox) times out very frequently

  1. above a certain thread size
  2. after processing certain number of requests

puppeteer-node doesn't face the same issue. I have created a small script with exact same functionality using both puppeteer ruby and node.

require 'puppeteer'
require 'benchmark-memory'

browser_instance = Puppeteer.launch(
  product: 'firefox',
  channel: 'firefox',
  headless: true,
  executable_path: 'Firefox Nightly.app/Contents/MacOS/firefox',
)

puts browser_instance.ws_endpoint

THREAD_COUNT = 20

def create_thread(browser_instance, i)
  browser = Puppeteer.connect(browser_ws_endpoint: browser_instance.ws_endpoint)
  context = browser.create_incognito_browser_context
  page = context.new_page
  page.set_content("<html> <body> <h1> ok</h1></body></html>")
  Base64.strict_encode64(page.screenshot)
  puts "processed from thread #{i}"
  # page.close
  # context.close
  browser.disconnect
  true
rescue Puppeteer::TimeoutError => e
  puts "NAVIGATION_TIMEOUT: #{e}"
end

def make_threads(browser_instance)
  threads = (0..THREAD_COUNT).to_a.map do |i|
    Thread.new do
      create_thread(browser_instance, i)
    end
  end

  threads.map(&:join)
end

Benchmark.memory do |x|
  x.report("try 1") do
    puts "1st Run"
    make_threads(browser_instance)
  end

  x.report("try 2") do
    puts "2nd Run"
    make_threads(browser_instance)
  end

  x.report("try 3") do
    puts "3rd Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 4") do
    puts "4rd Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 5") do
    puts "5th Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 6") do
    puts "6th Run"
    make_threads(browser_instance)
    nil
  end

  x.report("try 7") do
    puts "7th Run"
    make_threads(browser_instance)
    nil
  end

  x.compare!
end

puppeteer-node with the exact same steps as ruby.

const puppeteer = require('puppeteer-core');

(async () => {
  const THEREAD_SIZE = 100

  const browser = await puppeteer.launch({
    browser: 'firefox',
    product: 'firefox',
    headless: true,
    executablePath: 'Firefox Nightly.app/Contents/MacOS/firefox',
    protocol: 'cdp', //'webDriverBiDi',
  });

  console.log(browser.wsEndpoint());

  const createScreenshot = async (i) => {
    const browserInstance = await puppeteer.connect({ browserWSEndpoint: browser.wsEndpoint() })
    const context = await browserInstance.createBrowserContext();
    const page = await browser.newPage();
    await page.setContent("<html> <body> <h1> ok</h1></body></html>");
    let ss = await page.screenshot({ path: `screenshot_${i}.jpg` });
    // await context.close();
    await browserInstance.disconnect();
    return i
  }

  const times = Array.from(Array(THEREAD_SIZE).keys());

  let tasks = []

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(THEREAD_SIZE + i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  tasks = times.map((x, i) => {
    return new Promise( (resolve, reject) => {
      setTimeout(resolve, 100, createScreenshot(THEREAD_SIZE * 2 + i));
    });
  });

  await Promise.all(tasks).then((result) => {
    console.log(result)
    console.log(`Completed ${THEREAD_SIZE} Screenshots`)
  });

  console.log("DONE!")
})()

Comparing the node.js and ruby code you'd notice that it's using the exact same workflow for better comparison, but the puppeteer-ruby hangs if THREAD_COUNT is set to 50/100 or doesn't complete the entire script if the THREAD_COUNT is set to 25. Somehow I'm noticing degraded performance beyond 100 runs.

Expected behavior

puppeteer-ruby should not freeze

Environment

Ubuntu 22 / MacOS

Paste the output of ruby --version ruby-3.3.4

bufordtaylor commented 1 week ago

Just chiming in here. I found that 'browser.disconnect' leaves the browser instance open. While 'browser.close' actually closes the application entirely. Might help.

ronakjain90 commented 1 week ago

Actually that's intended. I don't want to open/close browser application for every screenshot.