bitcrowd / chromic_pdf

Convenient HTML to PDF/A rendering library for Elixir based on Chrome & Ghostscript
Apache License 2.0
409 stars 37 forks source link

on_demand implementation sends exit signal to caller #233

Closed dvic closed 1 year ago

dvic commented 1 year ago

With the current on_demand implementation, the caller will receive an {:EXIT, _supervisor_pid, :normal} signal, because the supervision tree is started on demand with Supervisor.start_link and stopped right after the request with Supervisor.stop.

Is this something that's intended?

Background context:

I ran into issues with commanded after this PR https://github.com/commanded/commanded/pull/512. An event handler was calling ChromicPDF and was being shut down because of the above.

maltoe commented 1 year ago

Hi @dvic

could you give me a minimal example to reproduce this?

iex(1)> ChromicPDF.start_link(on_demand: true)
{:ok, #PID<0.314.0>}
iex(2)> {:ok, _} = ChromicPDF.print_to_pdf({:html, "test"}); nil
nil
iex(3)> flush
:ok

Would have expected to receive the :EXIT message here. Am I missing anything?

dvic commented 1 year ago

The trick is to trap exits:

iex(1)> Mix.install([{:chromic_pdf, "~> 1.7.0"}])
:ok
iex(2)> Process.flag(:trap_exit, true)
false
iex(3)> ChromicPDF.start_link(on_demand: true)
{:ok, #PID<0.207.0>}
iex(4)> {:ok, _} = ChromicPDF.print_to_pdf({:html, "test"}); nil
nil
iex(5)> flush
{:EXIT, #PID<0.209.0>, :normal}
:ok
iex(6)>
dvic commented 1 year ago

Maybe I'm mistaken here and it's completely fine like this because you're supposed to ignore exits with reason :normal? It's just that I was not expecting this situation to be triggered by a call to ChromicPDF.print_to_pdf.

maltoe commented 1 year ago

Oh you're trapping the exit signal. Yes in that case this makes sense. The supervisor that is spawned is linked to your client process and is terminated again with reason :normal, so this signal is normally ignored by your process. That's not particular to ChromicPDF, this does the same:

iex(1)> Process.flag(:trap_exit, true)
false
iex(2)> spawn_link(fn -> :ok end)
#PID<0.113.0>
iex(3)> flush
{:EXIT, #PID<0.113.0>, :normal}
:ok

Compare this to without trap_exit, where you don't get a message, but the signal is handled (i.e. :normal reason -> ignore, otherwise crash). With trap_exit, the signal is converted to a message, so yes, I'd say that you're supposed to ignore these. Also other libraries may exhibit similar behaviour if they spawn processes, right? Say something that spawns a Task or so.

maltoe commented 1 year ago

However, the more I think about it, the more I'm sceptical about the current implementation as well. Without on_demand, any errors inside ChromicPDF (e.g. connection crash, at some point the supervisor itself will give up and exit) would not terminate the client process as the ChromicPDF supervisor and children are not linked to it. In on_demand mode they are, which is a weird misalignment? Maybe this should be refactored, I imagine something like this would work:

ChromicPDF.OnDemandSupervisor
├─ <the agent that keeps the config>
├─ <a DynamicSupervisor to start the actual supervision tree dynamically>

This way the "on demand" tree wouldn't be linked to the client process. What do you think?

dvic commented 1 year ago

Oh you're trapping the exit signal. Yes in that case this makes sense. The supervisor that is spawned is linked to your client process and is terminated again with reason :normal, so this signal is normally ignored by your process. That's not particular to ChromicPDF, this does the same:

iex(1)> Process.flag(:trap_exit, true)
false
iex(2)> spawn_link(fn -> :ok end)
#PID<0.113.0>
iex(3)> flush
{:EXIT, #PID<0.113.0>, :normal}
:ok

Compare this to without trap_exit, where you don't get a message, but the signal is handled (i.e. :normal reason -> ignore, otherwise crash). With trap_exit, the signal is converted to a message, so yes, I'd say that you're supposed to ignore these. Also other libraries may exhibit similar behaviour if they spawn processes, right? Say something that spawns a Task or so.

Yeah that makes sense, I'm going to change the behaviour to ignore the :normal exit reasons.

However, the more I think about it, the more I'm sceptical about the current implementation as well. Without on_demand, any errors inside ChromicPDF (e.g. connection crash, at some point the supervisor itself will give up and exit) would not terminate the client process as the ChromicPDF supervisor and children are not linked to it. In on_demand mode they are, which is a weird misalignment? Maybe this should be refactored, I imagine something like this would work:

ChromicPDF.OnDemandSupervisor
├─ <the agent that keeps the config>
├─ <a DynamicSupervisor to start the actual supervision tree dynamically>

This way the "on demand" tree wouldn't be linked to the client process. What do you think?

Indeed, haven't thought about this other implication. The proposed solution sounds like a good idea to me. However, it would only not be linked to the caller if the max_restarts is not exceed right? I'm not sure how DynamicSupervisor differs from regular ones in terms of child restarts / error propagation.

maltoe commented 1 year ago

However, it would only not be linked to the caller if the max_restarts is not exceed right?

No, there would be no link between the client and the ChromicPDF process spawned by the DynamicSupervisor. DynamicSupervisor is a GenServer and the process is spawned in the start_child callback, not by the caller, see here. I'd interpet max_restarts as the value after which the DynamicSupervisor itself gives up and crashes when one of its children do will notice, but I don't think this is what you meant).

Either way, I'd consider the on_demand implementation "ok for now" and close this issue.

maltoe commented 1 year ago

@dvic Maybe you've seen it already: Had some fun yesterday with DynamicSupervisor & co but realized that linking the client process to the temporary browser process is actually a pretty good idea, as we might otherwise leave Chrome instances around when the client process dies (e.g. killed by external watchdog / timeout). So, to answer your original question: Yes, this is intended behaviour :upside_down_face: Only took me a while to figure this out.

dvic commented 1 year ago

Hah okay nice, one thing to consider you could also do here: instead of creating a link, you could monitor the client process and shutdown the supervisor whenever it dies.

maltoe commented 1 year ago

Hah okay nice, one thing to consider you could also do here: instead of creating a link, you could monitor the client process and shutdown the supervisor whenever it dies.

Yes, I considered that but it's like 10x more complex for ... what exactly? Only for @dvic not to get EXIT messages because he's trapping exits :smile: IMO, the behaviour of the linked process is exactly what we want, if the client does down, we want to pull down the temporary Chrome instance as well. Also semantically makes sense for me. Maybe we should just put a note in the docs mentioning this.

dvic commented 1 year ago

Hah okay nice, one thing to consider you could also do here: instead of creating a link, you could monitor the client process and shutdown the supervisor whenever it dies.

Yes, I considered that but it's like 10x more complex for ... what exactly? Only for @dvic not to get EXIT messages because he's trapping exits 😄

Hah fair enough 😂

IMO, the behaviour of the linked process is exactly what we want, if the client does down, we want to pull down the temporary Chrome instance as well. Also semantically makes sense for me.

I agree that if the client goes down, we want to pull down the temporary chrome instance as wel, but what if the temporary chrome instance crashes (so reason is not :normal)? Do you then still expect the client process to crash as well? (assuming it is not trapping exits)

Maybe we should just put a note in the docs mentioning this.

I saw it in the PR, good idea 👍

maltoe commented 1 year ago

Do you then still expect the client process to crash as well?

Yes, it would and IMO that's a reasonable default behaviour, especially considering there is no other error handling in place. I agree that raising an exception would be nicer to the client as it could be more easily rescued. But I'm not sure if the more complicated manual monitoring setup (both ways) would be worth it. :shrug: Would be happy to accept PRs though.

dvic commented 1 year ago

But I'm not sure if the more complicated manual monitoring setup (both ways) would be worth it. Would be happy to accept PRs though.

Not sure, I could work on this but before I take a swing at this, I wanted to ask one more thing: is there a reason (other than cleanup) for us to shutdown the browser instance with the on demand option? Would it be an option to further simplify it by just using a dynamic supervisor that holds the browser instance without shutting it down?

maltoe commented 1 year ago

But I'm not sure if the more complicated manual monitoring setup (both ways) would be worth it. Would be happy to accept PRs though.

Not sure, I could work on this but before I take a swing at this, I wanted to ask one more thing: is there a reason (other than cleanup) for us to shutdown the browser instance with the on demand option? Would it be an option to further simplify it by just using a dynamic supervisor that holds the browser instance without shutting it down?

The main point of the on_demand mode is to take down the browser after use, to avoid zombies. For instance when you run tests with ExUnit or when you exit an iex shell with ctrl+c, it doesn't cleanly shutdown your application, and without on_demand the Chrome process becomes orphaned.

dvic commented 1 year ago

Clear, thanks for the explanation!

For the time being I think #235 is a fine solution. When I have the time I'll look into hooking a process monitor in this solution.