Colcon does not show failed build output if using Ninja generator

rotu commented 4 years ago

When using Ninja as CMake's build system generator, colcon does not show error information:

colcon build --cmake-args " -GNinja" --packages-select rmw_cyclonedds_cpp
Starting >>> rmw_cyclonedds_cpp
Failed   <<< rmw_cyclonedds_cpp [ Exited with code 1 ] 

Summary: 0 packages finished [3.15s]
  1 package failed: rmw_cyclonedds_cpp

These errors do show up in build logs:

build_2020-03-13_15-32-21.zip

dirk-thomas commented 4 years ago

Does Ninja output the error messages to stderr? If not, colcon doesn't know about them and by default hides stdout output.

rotu commented 4 years ago

As you can see in the attached log files, Ninja redirects the stderr of subprocesses to its stdout.

dirk-thomas commented 4 years ago

This seems similar to colcon/colcon-core#236. There would need to be custom logic to redirect the specific Ninja error message to stderr. Otherwise colcon can't do anything about it.

So the question is are you interested in implementing custom logic for this? If yes, I would be happy to review a pull request.

rotu commented 4 years ago

Oof. I just looked into what it would take. The assumption that "the only feedback from build tools the user is immediately interested in is stderr" seems buried deep in the architecture of colcon_core.subprocess.run/check_output. So this is going to be less trivial that I assumed.

Before I start:

Why does colcon run subprocesses in pseudoterminals?
How does colcon prevent stderr output from subprocesses being interleaved in its own stdout? It seems like there are no precautions.

dirk-thomas commented 4 years ago

Why does colcon run subprocesses in pseudoterminals?

Because invoked tools operate differently when being invoked in a shell / pseudoterminal rather than without. One example is colorization of compiler warnings. Without using a pseudoterminal the compiler coming through e.g. make is not colorized anymore.

dirk-thomas commented 4 years ago

How does colcon prevent stderr output from subprocesses being interleaved in its own stdout? It seems like there are no precautions.

There are only two possible options:

redirect both streams to a single stream and with that loose the ability to distinguish what output came via stdout and stderr or
capture both streams separately in order to handle them differently but with the downside that there is no way to synchronize them - the best you can do is read / handle them as fast as possible to introduce as little as possible "order change".

rotu commented 4 years ago

Because invoked tools operate differently when being invoked in a shell / pseudoterminal rather than without. One example is colorization of compiler warnings. Without using a pseudoterminal the compiler coming through e.g. make is not colorized anymore.

Okay. So $MAKE_TERMOUT and $MAKE_TERMERR would fix the output for Make, if the output of Colcon is indeed going to a terminal and not a log file. It would also work on Windows, where PTYs are not supported. Is there another place this breaks functionality?

dirk-thomas commented 4 years ago

So $MAKE_TERMOUT and $MAKE_TERMERR would fix the output for Make,

What about any other invoked command which might have shell/pseudoterminal conditional behavior? The current approach of using PTYs works for all of them - I don't see the reason to replace it with an alternative approach which as in this case is tool specific and we don't know for which other tools we would need something similar.

rotu commented 4 years ago

Better yet, CLICOLOR_FORCE and CLICOLOR, which CMake respects already and should pass on to makefile.

The reason I propose this is because the pty code is fragile. It uses limited resources (pty's are finite), it prevents the process from exiting (this patch makes ctrl-c much more responsive), it causes hangs due to async issues (see comment in code about having to put in an endline to work around bugs), and it makes flow control weird (when a process ends, it should release its resources as it unwinds the stack, but because pty's are conditionally used, there isn't a finally block to put this cleanup in).

dirk-thomas commented 4 years ago

Better yet, CLICOLOR_FORCE and CLICOLOR, which CMake respects already and should pass on to makefile.

And this that doesn't address the various other tools invoked: e.g. bazel, gradle, etc.,

rotu commented 4 years ago

I think I'll need to ponder on this. While using a pty will make Bazel use color, it might also make it use curses, which seems like an accident. It also seems like a bad idea in Gradle to use rich output until this is resolved https://github.com/gradle/gradle/issues/6843

rotu commented 4 years ago

We also don’t want build tools to see that a terminal is interactive and wait for user input from a user who never sees the prompt (that may be why CMake ExternalProjects can take so long before failing... because they’re waiting on interactive input)

dirk-thomas commented 4 years ago

The check_call() API already exposes the use_pty option. I would suggest to update code which uses that API (e.g. the CMake build task) to explicitly pass False while also setting the necessary environment variables to ensure the current behavior is maintained. Once all existing code has been migrated to use use_pty=False the option can be deprecated.

dirk-thomas commented 4 years ago

@rotu Since the discussion has been going a bit of topic (about use_pty) what about the topic in the title of the ticket and my original question about how to proceed?

This seems similar to colcon/colcon-core#236. There would need to be custom logic to redirect the specific Ninja error message to stderr. Otherwise colcon can't do anything about it.

So the question is are you interested in implementing custom logic for this? If yes, I would be happy to review a pull request.

rotu commented 4 years ago

I don't know the right approach to fixing this but I don't think it should be special treatment for Ninja. I think the best solution would be, when a subprocess fails, to print a postmortem with both the subprocess's stdout and stderr.

I'm not comfortable doing it before use_pty goes away because I think that code is too fragile.

dirk-thomas commented 4 years ago

I think the best solution would be, when a subprocess fails, to print a postmortem with both the subprocess's stdout and stderr.

Atm when a task fails the stderr is being shown which seems to be the information you would want to see (rather than all of stdout which can be very verbose). That approach works very well in many use cases - not for Ninja obviously.

I don't think the behavior should change to accommodate the fact that Ninja doesn't print error message to stderr. Maybe reporting this upstream at the Ninja project might be an option.

dirk-thomas commented 4 years ago

Te question about the Ninja output (which doesn't print error message to stderr, same as e.g. msbuild) has been answered and needs to be either:

addressed upstream in Ninja or
handled with custom logic to extract errors from Ninja (which I don't think is something we should do since it is likely fragile)

The pty discussion was off topic anyway. Please feel free to follow up on those independently.

So I am going to close the ticket since there is nothing to be done in this repo atm. @rotu Please post a link to an upstream ticket as soon as you have created one so future readers can check on the status.

tylerjw commented 4 years ago

Sadly it looks like the ninja project itself considers "feature" to be not-useful: https://github.com/ninja-build/ninja/issues/1537

Their reasoning is that the error message is not useful without the context of what command caused it and therefore if you care about why it failed, you also care about what command was called when it failed.

The suggested solution there is this:

NINJA_STATUS="[ninja][%f/%t]
filter() {
  local l n
  while read -r l; do
    n=${l#\[ninja\]}
    [ "x$l" != "x$n" ] && printf "%s\n" "$n" || printf "%s\n" "$l" >&2
  done
}

NINJA_STATUS="[ninja][%f/%t] " ninja | filter

I don't know how difficult this would be to implement (here) but it is simple enough. I'll see if I can figure out how to add this logic.

rotu commented 3 years ago

This ninja PR seems like we could hook into it to get failure output: https://github.com/ninja-build/ninja/pull/1600

alecGraves commented 2 years ago

BTW, if anyone is looking for a way to use ninja builds and still see build errors, colcon can output stdout for packages during a build:

colcon build --event-handlers console_cohesion+

or for immediate build output (lines interleaved for parallel builds):

colcon build --event-handlers console_direct+

Taken from: https://colcon.readthedocs.io/en/released/user/how-to.html Also mentioned here: https://answers.ros.org/question/363112/how-to-see-compiler-invocation-in-colcon-build/

rsarrazin2 commented 2 years ago

@alecGraves I took the liberty of referencing your workaround in How can I make colcon output a plain CMake project's output with the ninja generator?. If you have a stackoverflow account I'd be happy to tag you explicitly.

Ryanf55 commented 4 months ago

Hi there,

Would to colcon team accept a contribution that enables event-handlers=console_cohesion+ when CMAKE_GENERATOR=Ninja? We switched to using Ninja because it was faster on most of our builds, but if you forget to add this and the build fails, you don't get the output.

I also don't think this issue is solved; the workaround is not a solution as this behavior is not expected.

colcon / colcon-cmake

Colcon does not show failed build output if using Ninja generator #67