Thomas-Boi / heroku-playwright-python-browsers

A buildpack to install the Chromium exe and its dependencies for us with playwright.
MIT License
5 stars 2 forks source link

Having to run `playwright install` each time I run bash #1

Closed allenjhyang closed 4 months ago

allenjhyang commented 4 months ago

First - thanks for publishing this buildpack! I've been able to get it to work, which is incredibly useful.

One thing I've noticed is that whenever I heroku run bash -a myappname - for example, to kick off a custom script that uses Playwright - I get the prompted to run playwright install each time. This proceeds to run thru the install of Chromium, Firefox and Webkit, then the dyno complains about a list of missing dependencies. However, if i ignore that missing dependencies message and try to run my custom script once more, it'll succeed.

Is there a way around this? Does this signal that I installed something incorrectly?

For the record, doing heroku buildpack -a myappname shows this:

=== myappname Buildpack URLs

1. heroku/python
2. thomas-bui/heroku-playwright-python-browsers
3. playwright-community/heroku-playwright-buildpack
Thomas-Boi commented 4 months ago

Hi @ajyang818 ,

Thank you for your kind words.

Regarding your questions with playwright install, it shouldn't prompt you to do it because the buildpack already does that here.

image

As for the missing dependencies, I suspect it's related to this section from my FAQ. In summary, it's the heroku-playwright-buildpack's job to install the system dependencies:

  1. Why doesn't this buildpack install the requirements as well?

The command to install the system packages along with the browser is playwright install --with-deps. For some reason, this action is blocked by Heroku since it uses sudo underneath (which is not allowed as of 2024). Thus, only the browser installation can go through but that alone is not enough. We still need the system packages which is provided by heroku-playwright-buildpack.

I'm not sure why it prompted you for install again. Seems like the buildpacks might run ok but playwright is unable to recognize the dependencies have been installed.

I'd recommend the following steps:

  1. See if you can execute your script via the Procfile rather than using heroku bash. I use a Procfile and it works fine for me
  2. Try using the Heroku Scheduler to invoke your task since that's another way to run the script without using heroku bash
  3. Double check that the directory contains the appropriate cache of the installed requirements from playwright. If you check their docs, they mentions where the files are stored and you should be able to find it.
  4. See if there are any breaking changes to playwright-community/heroku-playwright-buildpack.
allenjhyang commented 4 months ago

As an update here (context: my project is a scrapy spider deployed on Heroku; the below is true whether I kick off the scraping via Heroku Scheduler or, now, using scrapyd via API call):

I thought the more ideal outcome would be for the dyno to install chromium upon its start, rather than as part of the spider. But:

Am I understanding the "ideal outcome" correctly, or is what I have set up now the expected path here?

Thomas-Boi commented 4 months ago

Hi @ajyang818 ,

I'm not 100% sure what's happening. It might be that both paths contain valid executables for you to use. The correct way though should be to use the path provided via CHROMIUM_EXECUTABLE_PATH.

FFMPEG is interesting though. It's meant for recording things, which I don't think the browser needs to run. If you look into the heroku-playwright-buildpack's dependencies, they don't have anything related to FFMPEG.

I'm not sure why the installed exe is not detected by your script. A buildpack runs before a dyno image is finalized -> the files will always be there after a restart. The fact that your script works even though it complains that you need to install packages mean that the files do exist in the dyno image.

Unfortunately, I won't look deeper into this since I made this package mostly for my own use case. If you do figure out a solution though, feel free to make a PR and I'll consider merging it in. I'll close this Issue as Unresolved for now.