The conversion process can encounter various issues during its lifetime, that can affect its termination. For example, it may fail early, linger for a little while, get stuck, etc. Previously, we assumed that any such scenario would simply be resolved with a 3-second wait timeout, provided that the conversion has already finished successfully. Turns out that this is not the case, and we have several users who reported timeout issues (see #749).
In this PR, we replace these arbitrary timeouts with some termination logic, that tries to terminate first the process gracefully, and if it doesn't comply, forcefully. This seems simple enough, but in practice we have the following issues:
Killing the spawned container process does not always mean that the container will be killed. Take Docker for example, where the spawned process is connected to the Docker daemon. The Docker daemon is then responsible for starting the container.
Due to the finicky nature of terminating our conversion processes, and because we haven't 100% nailed the underlying reason for #749, this PR adds cross-platform tests for virtually every termination scenario that can occur. The end goal is to never let users see another timeout exception again, no matter the underlying cause.
[!NOTE]
This PR does not cover the termination scenarios of the pixels to PDF conversion process. That's because we're working on removing the need for a container in that phase (see #748).
The conversion process can encounter various issues during its lifetime, that can affect its termination. For example, it may fail early, linger for a little while, get stuck, etc. Previously, we assumed that any such scenario would simply be resolved with a 3-second wait timeout, provided that the conversion has already finished successfully. Turns out that this is not the case, and we have several users who reported timeout issues (see #749).
In this PR, we replace these arbitrary timeouts with some termination logic, that tries to terminate first the process gracefully, and if it doesn't comply, forcefully. This seems simple enough, but in practice we have the following issues:
Due to the finicky nature of terminating our conversion processes, and because we haven't 100% nailed the underlying reason for #749, this PR adds cross-platform tests for virtually every termination scenario that can occur. The end goal is to never let users see another timeout exception again, no matter the underlying cause.
Fixes #749 Refs #563