elacuesta / scrapy-playwright-cloud-example

Trying scrapy-playwright on Scrapy Cloud
18 stars 6 forks source link

EACESS error when I use playwright into a Docker image with scrapy #2

Open thekage91 opened 3 years ago

thekage91 commented 3 years ago

Hi, I'm trying to start playwright under a Docker image to deploy it on Zyte. The main purpose is use playwright to load html pages to scrape.

I'm using scrapy as Scraping Framework and I build my scraper in a custom Docker image. The is the part when I install and compile playwright

RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
# COPY ./.env /app/.env
RUN pip install --no-cache-dir -r requirements.txt
RUN playwright install chromium # I only need chromium for now

RUN mkdir /ms-playwright
RUN chmod -Rf 777 /ms-playwright
RUN mv /root/.cache/ms-playwright/chromium-* /ms-playwright/chromium # default installation is /root/.cache 

COPY . /app
RUN python setup.py install

Because the default installation is in /root/.cache I prefer move chromium under another directory and pass it to my scraper through executable_path , see code below

browsers = {
    "chromium": "/ms-playwright/chromium",
}

PLAYWRIGHT_BROWSER_TYPE = "chromium"
PLAYWRIGHT_LAUNCH_OPTIONS = {"executable_path": browsers[PLAYWRIGHT_BROWSER_TYPE]}

The building process is ok, the image is created successfully but If I try to run the container with a specific scrapy commands, the process return me this error:

Note: use DEBUG=pw:api environment variable to capture Playwright logs.
2021-07-04 18:13:03 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ScrapyPlaywrightDownloadHandler._engine_started of <scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler object at 0x7f727c1cdf70>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 837, in adapt
    extracted = result.result()
  File "/usr/local/lib/python3.8/site-packages/scrapy_playwright/handler.py", line 104, in _launch_browser
    self.browser = await browser_launcher(**self.launch_options)
  File "/usr/local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 9485, in launch
    await self._async(
  File "/usr/local/lib/python3.8/site-packages/playwright/_impl/_browser_type.py", line 90, in launch
    raise e
  File "/usr/local/lib/python3.8/site-packages/playwright/_impl/_browser_type.py", line 86, in launch
    return from_channel(await self._channel.send("launch", params))
  File "/usr/local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 36, in send
    return await self.inner_send(method, params, False)
  File "/usr/local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 54, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Failed to launch: Error: spawn /ms-playwright/chromium EACCES
=========================== logs ===========================

I think that I have problems with permissions under the chromium directory but any attempts to resolve it is failed.

Anyone here can help me?

Thank you very much

Bulga-xD commented 10 months ago

@thekage91 did you manage to fix it ?