Make subprocess command logging atomic

rmartin16 commented 1 month ago

What is the problem or limitation you are having?

Each time a subprocess command is executed, Subprocess logs its command line, environment, return code, stdout/stderr, etc. However, these operations are independently serial; therefore, when subprocess calls are executed concurrently, the logging of different commands can become intermingled and undecipherable.

This is currently a problem for macOS code signing as all of the files are concurrently signed by a threadpool.

Example log file: https://github.com/user-attachments/files/16044976/briefcase.2024_06_30-10_13_00.package.log

Describe the solution you'd like

Convert all of the subprocess command logging to be atomic such that each block of logging for a specific command is guaranteed to be on consecutive lines in the log.

Describe alternatives you've considered

Omit concurrent subprocess execution.

Additional context

No response

freakboy3742 commented 1 month ago

Agreed this is an issue; code signing logs are very difficult to parse. Not sure where to start looking at a solution - we essentially need to be able to push a logging context when we enter a thread, and only display the output of that context once the thread has been joined.

rmartin16 commented 1 month ago

I was thinking about something a little less invasive given our use-case....although, this solution is also not generalized.

So, right now, we print some initial information about the command, run the command while potentially printing its output to the console, and finish with some information about the outcome of the command.

In the situations, though, where the commands are being run concurrently, we probably aren't going to be depending on any output from the commands being printed in real-time (since that'd likely be a real mess); therefore, we could simply cache this logging until the command finishes and then flush the logging at the end for each command.

Such as implementation would probably leverage a flag for calls to Subprocess to activate such a style of logging; but the current style of logging would still be used non-concurrent commands. (I imagine the logging flush may also require some level of locking to avoid intermingled flushing.)

Thoughts?

freakboy3742 commented 1 month ago

In the situations, though, where the commands are being run concurrently, we probably aren't going to be depending on any output from the commands being printed in real-time (since that'd likely be a real mess); therefore, we could simply cache this logging until the command finishes and then flush the logging at the end for each command.

I'd got so far as to say we can't rely on output from concurrent commands being printed in real-time, and we should actively protect against that possibility.

Such as implementation would probably leverage a flag for calls to Subprocess to activate such a style of logging; but the current style of logging would still be used non-concurrent commands. (I imagine the logging flush may also require some level of locking to avoid intermingled flushing.)

Thoughts?

Makes sense - although I do wonder where the line is between "run with threadpool-compatible logging" and "check_output". Even check_output would need some changes to be useful in a threadpool situation - we effectively need "check_output, but return all the useful details, rather than logging them" so that the thread joiner can dump the log everything.

Looking at the implementation, I wonder if the easiest approach might be to make the "log wrapper object" an argument to check_output (and maybe other methods?). At present, the log wrapper is the subprocess tool class (i.e., self) so we can call self._log_output() etc); but it wouldn't be too hard to write a proxy object with the same interface that stores all the calls made on _log_*(), create one of these proxies for each thread, pass it in when invoking check_output threaded, and then replay what the proxy saw back onto subprocess when the thread is joined.

beeware / briefcase