bitcrowd / chromic_pdf

Convenient HTML to PDF/A rendering library for Elixir based on Chrome & Ghostscript
Apache License 2.0
409 stars 37 forks source link

[#249] Chromium on a different machine in the network #251

Closed maltoe closed 1 year ago

maltoe commented 1 year ago
$ docker run --rm -p 9222:9222 zenika/alpine-chrome:114 \
  --remote-debugging-port=9222 \
  --remote-debugging-address=0.0.0.0 \
  --headless \
  --no-sandbox
iex(1)> ChromicPDF.start_link(chrome_address: {"localhost", 9222})
{:ok, #PID<0.302.0>}
iex(2)> ChromicPDF.print_to_pdf({:html, "foo"})
{:ok, "JVBERi0x..."}
gmile commented 1 year ago

Hi, @maltoe! Apologies for a radio silence on this one! I just tried this branch, and it works great!

My test scenario:

  1. after deploying the container on k8s cluster, forward the port to local machine:

    kubectl port-forward service/chromium 9222:9222
  2. Run elixir test.exs where test.exs contains this:

    Mix.install([
      {:chromic_pdf, github: "bitcrowd/chromic_pdf", branch: "feature/249-chromium-on-a-different-machine-in-the-network"},
      :websockex
    ])
    
    ChromicPDF.start_link(connection_strategy: ChromicPDF.Connection.Inet)
    
    tasks = for i <- 1..25 do
      Task.async(fn ->
        {:ok, base64string} = ChromicPDF.print_to_pdf({:html, "Hello, world #{i}, #{DateTime.utc_now()}"})
        File.write!("/tmp/testing-chromic-#{i}.pdf", Base.decode64!(base64string))
      end)
    end
    
    Task.await_many(tasks, 5000)
  3. Sanity check contents of rendered PDFs:

    pdfgrep Hello /tmp/testing-chromic*
    /tmp/testing-chromic-1.pdf:Hello, world 1, 2023-06-16 12:38:21.670024Z
    /tmp/testing-chromic-2.pdf:Hello, world 2, 2023-06-16 12:38:21.670037Z
    /tmp/testing-chromic-3.pdf:Hello, world 3, 2023-06-16 12:38:21.670030Z
    /tmp/testing-chromic-4.pdf:Hello, world 4, 2023-06-16 12:38:21.670042Z
    /tmp/testing-chromic-5.pdf:Hello, world 5, 2023-06-16 12:38:21.670033Z
    /tmp/testing-chromic-6.pdf:Hello, world 6, 2023-06-16 12:38:21.670044Z
    /tmp/testing-chromic-7.pdf:Hello, world 7, 2023-06-16 12:38:21.670036Z
    /tmp/testing-chromic-8.pdf:Hello, world 8, 2023-06-16 12:38:21.670046Z
    /tmp/testing-chromic-9.pdf:Hello, world 9, 2023-06-16 12:38:21.670038Z
    /tmp/testing-chromic-10.pdf:Hello, world 10, 2023-06-16 12:38:21.670048Z
    /tmp/testing-chromic-11.pdf:Hello, world 11, 2023-06-16 12:38:21.670040Z
    /tmp/testing-chromic-12.pdf:Hello, world 12, 2023-06-16 12:38:21.670051Z
    /tmp/testing-chromic-13.pdf:Hello, world 13, 2023-06-16 12:38:21.670048Z
    /tmp/testing-chromic-14.pdf:Hello, world 14, 2023-06-16 12:38:21.670053Z
    /tmp/testing-chromic-15.pdf:Hello, world 15, 2023-06-16 12:38:21.670056Z
    /tmp/testing-chromic-16.pdf:Hello, world 16, 2023-06-16 12:38:21.670056Z
    /tmp/testing-chromic-17.pdf:Hello, world 17, 2023-06-16 12:38:21.670059Z
    /tmp/testing-chromic-18.pdf:Hello, world 18, 2023-06-16 12:38:21.670059Z
    /tmp/testing-chromic-19.pdf:Hello, world 19, 2023-06-16 12:38:21.670061Z
    /tmp/testing-chromic-20.pdf:Hello, world 20, 2023-06-16 12:38:21.670062Z
    /tmp/testing-chromic-21.pdf:Hello, world 21, 2023-06-16 12:38:21.670064Z
    /tmp/testing-chromic-22.pdf:Hello, world 22, 2023-06-16 12:38:21.670065Z
    /tmp/testing-chromic-23.pdf:Hello, world 23, 2023-06-16 12:38:21.670070Z
    /tmp/testing-chromic-24.pdf:Hello, world 24, 2023-06-16 12:38:21.670067Z
    /tmp/testing-chromic-25.pdf:Hello, world 25, 2023-06-16 12:38:21.670072Z

Looks like it's working very well 👏

gmile commented 1 year ago

A few things I've noticed:

  1. currently, it doesn't seem to be possible to configure hostname / port number of where the Chromium service will be available; I tested by deploying Chromium container into our test K8s cluster, then forwarding a port 9222 to my local machine; I wish there would be ability to configure both hostname and port, so that I could reach the service on the network using http://chromium (on port 80 instead of port 9222).

  2. if I increase number of tasks from 25 to 100, or even to 50 - my terminal gets flooded with error in response, stating that size of connection pool needs to be adjusted (or task processing could use something like Oban); I guess this is expected thought, and with some tuning I could potentially get to 100 concurrent PDF renderings,

maltoe commented 1 year ago

Hi @gmile

thank for reporting back! Glad to hear that it works.

To your questions:

1) Yes, the branch is pretty much in proof-of-concept state now. It needs quite a bit of polishing before I'd consider it ready for merging, especially in terms of documentation. Not sure yet when I'll have time to work on it, but promise that it'll be there at some day. Also I'll try to sneak a patch into it to allow configuring the port & address. 2) Yes, expected. Already saw that when you posted you example. ChromicPDF uses NimblePool under the hood, where the worker checkout mechanism functions like a queue, but it solely relies on the pool master process' message queue, which may not be what you want. Hence the suggestion to use something like Oban in the error message.

gmile commented 1 year ago

Not sure yet when I'll have time to work on it, but promise that it'll be there at some day

No worries, no pressure from my side for sure. We will need to do some more testing before running this in prod (our puppeteer set up somewhat grew and I need to see if all of it could be seamlessly ported to talking to a bare Chromium instance).

If you could squeeze ability to hostname/port, that would be a most welcome change :+1:

maltoe commented 1 year ago

If you could squeeze ability to hostname/port, that would be a most welcome change +1

I've just discovered that this feature may help me in on of our own projects, so you're lucky. Please take another look at this PR (feedback welcome!), I've dropped the connection_strategy option and instead replaced it with chrome_address: {host, port}. Not sure whether this is the final naming yet.

Sorry for the force push, realized too late that this may break your ongoing tests.

maltoe commented 1 year ago

[solved]