aws-actions / aws-codebuild-run-build

Run an AWS CodeBuild project as a step in a GitHub Actions workflow job.
https://aws.amazon.com/codebuild
Apache License 2.0
274 stars 139 forks source link

Logs are missing when CodeBuild uses Compute fleets #171

Open mic-kul opened 2 weeks ago

mic-kul commented 2 weeks ago

Hi,

We're encountering an issue with the aws-codebuild-run-build action when using CodeBuild Compute fleets: the logs are missing, whenever generating output takes more than 60s (default updateInterval 30s).

I've checked CloudWatch GetLogEvents metrics and found no errors.

We run this with default update interval of 30 seconds.

First, I thought we were encountering the condition described in this section of the code:

  // GetLogEvents can return partial/empty responses even when there is data.
  // We wait for two consecutive empty log responses to minimize false positive on EOF.
  // Empty response counter starts after any logs have been received, or when the build completes.

However, it doesn't make sense that everything works as expected when running on On Demand builders, and the issue occurs only when we run the build on CodeBuild Compute Fleet.

The minimal buildspec to reproduce the issue:

version: 0.2

phases:
  pre_build:
    commands:
      - echo "Preparing to execute the sleep script"
  build:
    commands:
      - echo "Starting the sleep script"
      - |
        #!/bin/bash

        # Initialize total sleep time
        total_sleep_time=20

        # Loop until total sleep time reaches or exceeds 60 seconds
        while [ $total_sleep_time -lt 160 ]; do
          echo $total_sleep_time
          sleep $total_sleep_time
          total_sleep_time=$((total_sleep_time + 15))

        done

        echo "Total sleep time: $total_sleep_time seconds"
  post_build:
    commands:
      - echo "Sleep script execution completed"

Example:

When running CodeBuild On Demand started by this Github Action, GHA outputs:

on_demand_finished



When running CodeBuild Compute fleets started by this Github Action, CB&GHA output:

Is there anything that can be done to try to pull all missing logs again, once "CODEBUILD COMPLETE" signal is received?

shuohaoliu commented 6 days ago

Created a pull request to use nextForwardToken instead of two consecutive empty event list to determine EOF when pulling CloudWatch log.

Regarding the concern why this was only observed when using CodeBuild Compute Fleet (not on-demand mode), I think it might be related with how/when CloudWatch agent is pushing the log from instance to CloudWatch service. For example, CloudWatch agent has some configuration such as force_flush_interval . When using CodeBuild on-demand compute resource, the ec2 instance would be terminated right after the build is complete, and CloudWatch agent would push everything in memory to the CloudWatch service without waiting during the termination/shutdown process. However, with CodeBuild compute fleet mode, you would get a reserved ec2 instance capacity, it won't be terminated after the build, hence CloudWatch agent would honor such configuration to determine when to push the log to CloudWatch service next time. It seems to be a timing issue in certain scenarios.

mic-kul commented 6 days ago

Thank you @shuohaoliu.

For the context I will add another oddity we've noticed is that all logs in Cloudwatch, when using CodeBuild Compute Fleet, have the same timestamp attached, and that is the timestamp of very first log message. No matter how much sleep we add in bash ;) We've already raised this with AWS Premium Support and it was escalated with CodeBuild team.

shuohaoliu commented 3 hours ago

Issue should have been fixed