ccjmne / puppeteer-html2pdf

Print your HTML to PDF via Puppeteer in a Docker container.
MIT License
13 stars 6 forks source link

Running html2pdf in AWS ECS #14

Closed chandanadesilva closed 1 day ago

chandanadesilva commented 1 week ago

Hello,

I want to use the html2pdf image to run a PDF generation service. I have found that it works very well to generate PDFs when I use it on my desktop. Thank you very much for publishing this.

I am trying to run the html2pdf container in AWS ECS.

As I mentioned earlier, The image works fine on my local desktop, but when I run it under AWS ECS/Fargate, the process with launches seems to launch a number of chrome processes and hangs. I show here the state before and during the launch process: Before Launch

Mem: 2456556K used, 29872588K free, 412K shrd, 30936K buff, 1947456K cached
CPU:   0% usr   0% sys   0% nic  99% idle   0% io   0% irq   0% sirq
Load average: 0.02 0.03 0.00 4/326 314
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  229    40 root     S    2179m   6%   2   0% /managed-agents/execute-command/ssm-session-worker ecs-execute-command-nzevbhhyqpa7kd4sjttow45jru
   40     8 root     S    2185m   6%   5   0% /managed-agents/execute-command/ssm-agent-worker
    8     0 root     S    2028m   6%   6   0% /managed-agents/execute-command/amazon-ssm-agent
    7     1 root     S     687m   2%   4   0% node -e require('./dist/server.js').use(require('puppeteer-core'))
  243   229 root     S     1696   0%   7   0% /bin/sh
  288   243 root     R     1624   0%   0   0% top
    1     0 root     S      944   0%   0   0% /dev/init -- node -e require('./dist/server.js').use(require('puppeteer-core'))

During Launch

CPU:  25% usr   0% sys   0% nic  74% idle   0% io   0% irq   0% sirq
Load average: 0.90 0.24 0.08 4/409 476
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  404   357 root     R    1155g3468%   5  13% /usr/lib/chromium/chromium --type=renderer --crashpad-handler-pid=351 --enable-crash-reporter=,Alpine Linux --noe
  398   357 root     R    1155g3468%   6  12% /usr/lib/chromium/chromium --type=renderer --crashpad-handler-pid=351 --enable-crash-reporter=,Alpine Linux --noe
  346     7 root     S    32.4g  97%   7   0% /usr/lib/chromium/chromium --user-data-dir=/root/.config/chromium --ozone-platform-hint=auto --allow-pre-commit-i
  229    40 root     S    2179m   6%   2   0% /managed-agents/execute-command/ssm-session-worker ecs-execute-command-nzevbhhyqpa7kd4sjttow45jru
  441   357 root     S    1155g3468%   1   0% /usr/lib/chromium/chromium --type=renderer --crashpad-handler-pid=351 --enable-crash-reporter=,Alpine Linux --noe
  462   356 root     S    32.3g  97%   6   0% /usr/lib/chromium/chromium --type=gpu-process --no-sandbox --disable-dev-shm-usage --disable-breakpad --headless 
  385   357 root     S    32.3g  97%   1   0% /usr/lib/chromium/chromium --type=utility --utility-sub-type=storage.mojom.StorageService --lang=en-US --service-
  357   346 root     S    32.3g  97%   5   0% /usr/lib/chromium/chromium --type=zygote --no-sandbox --headless --crashpad-handler-pid=351 --enable-crash-report
  356   346 root     S    32.3g  97%   1   0% /usr/lib/chromium/chromium --type=zygote --no-zygote-sandbox --no-sandbox --headless --crashpad-handler-pid=351 -
  351     1 root     S    32.0g  96%   1   0% /usr/lib/chromium/chrome_crashpad_handler --monitor-self --monitor-self-annotation=ptype=crashpad-handler --datab
  353     1 root     S    32.0g  96%   7   0% /usr/lib/chromium/chrome_crashpad_handler --no-periodic-tasks --monitor-self-annotation=ptype=crashpad-handler --
   40     8 root     S    2185m   6%   1   0% /managed-agents/execute-command/ssm-agent-worker
    8     0 root     S    2028m   6%   6   0% /managed-agents/execute-command/amazon-ssm-agent
    7     1 root     S     720m   2%   1   0% node -e require('./dist/server.js').use(require('puppeteer-core'))
  243   229 root     S     1696   0%   7   0% /bin/sh
  288   243 root     R     1628   0%   0   0% top
    1     0 root     S      944   0%   7   0% /dev/init -- node -e require('./dist/server.js').use(require('puppeteer-core'))

This is the output of a curl command which I am using to test:

sh-4.2$ curl -v -o /tmp/test.pdf -X POST http://pdfgen.fegov4720env20common:3000/ -H 'Content-Type: text/html' -d '<html><body><h1>Hello World!</h1></body></html>'
Note: Unnecessary use of -X or --request, POST is already inferred.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.0.2.25:3000...
* Connected to pdfgen.fegov4720env20common (10.0.2.25) port 3000
> POST / HTTP/1.1
> Host: pdfgen.fegov4720env20common:3000
> User-Agent: curl/8.3.0
> Accept: */*
> Content-Type: text/html
> Content-Length: 47
> 
} [47 bytes data]
100    47    0     0    0    47      0      0 --:--:--  0:01:27 --:--:--     0

I wonder if you can help Thanks and regards Chandana

chandanadesilva commented 1 week ago

I am seeing this log, which shows that the launch process is timing out (I have set the time out at four minutes):

2024-09-10T03:31:57 Launching Browser
2024-09-10T03:35:57 /app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:96
2024-09-10T03:35:57     #error = new Errors_js_1.ProtocolError();
2024-09-10T03:35:57              ^
2024-09-10T03:35:57 ProtocolError: Network.enable timed out. Increase the 'protocolTimeout' setting in launch/connect calls for a higher timeout if needed.
2024-09-10T03:35:57     at <instance_members_initializer> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:96:14)
2024-09-10T03:35:57     at new Callback (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:100:16)
2024-09-10T03:35:57     at CallbackRegistry.create (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/CallbackRegistry.js:32:26)
2024-09-10T03:35:57     at Connection._rawSend (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Connection.js:91:26)
2024-09-10T03:35:57     at CdpCDPSession.send (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/CDPSession.js:78:33)
2024-09-10T03:35:57     at NetworkManager.addClient (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/NetworkManager.js:80:20)
2024-09-10T03:35:57     at FrameManager.initialize (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/FrameManager.js:189:38)
2024-09-10T03:35:57     at #initialize (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Page.js:312:36)
2024-09-10T03:35:57     at CdpPage._create (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Page.js:98:31)
2024-09-10T03:35:57     at /app/node_modules/puppeteer-core/lib/cjs/puppeteer/cdp/Target.js:199:42
2024-09-10T03:35:57 Node.js v22.8.0
ccjmne commented 1 day ago

Hi,

Thank you @chandanadesilva for your report! My apologies for only getting back to you now.

I'm quite puzzled, I cannot think of a reason why you'd experience this... I'll try to replicate it in Fargate as well, soon. I'll keep you updated if I figure it out!

In case I can't reproduce it, would you be able to share the details of the specific instance you're running it under? Which AMI you're using, perhaps?

chandanadesilva commented 1 day ago

Hello @ccjmne Thanks for the reply. As I mentioned, the image works fine in my desktop (Fedora 38), but hangs when trying to run on Fargate. Fargate is the AWS ECS option where you don't need EC2 instances. So I can't say which is the underlying AMI.

I am wondering if the issue is with the Alpine distro, as there is no Chrome distribution for Alpine.

Because I had to get my project completed, I used https://github.com/bedrockio/export-html

ccjmne commented 1 day ago

Ah, thank you, @chandanadesilva, I didn't know what Fargate was exactly.

I'm curious as to how export-html works, then, since it also delegates to puppeteer... I'm planning to investigate it in the near future, if only to figure it out for myself!

Thanks for following up, and thanks for pointing to an alternative that worked in that case!

chandanadesilva commented 1 day ago

@ccjmne , The export html image is debian, that may be one reason. Their app code also seems to be a a bit more resilient. I don't know NodeJS, so can't comment too much about the app