Is there a reason why `client.images.build` doesn't stream output logs ?

sebpiq-kialo commented 5 years ago

Seems to be a recurrent question. I looked at the code and it seems to me there is no reason why the higher-level API generator couldn't return the output logs as they come. This could be achieved by using a sort of fake future for the image object returned by the build function. Something like that (very rough sketch) :

    def build(self, **kwargs):
        resp = self.client.api.build(**kwargs)
        if isinstance(resp, six.string_types):
            return self.get(resp)
        last_event = None
        image_id = None
        result_stream, internal_stream = itertools.tee(json_stream(resp))

        result = {}

        def gen():
            for chunk in internal_stream:
                if 'error' in chunk:
                    raise BuildError(chunk['error'], result_stream)
                if 'stream' in chunk:
                    match = re.search(
                        r'(^Successfully built |sha256:)([0-9a-f]+)$',
                        chunk['stream']
                    )
                    if match:
                        image_id = match.group(2)
                    else:
                        yield chunk
                last_event = chunk

            if image_id:
                result['image'] = self.get(image_id)
            else:
                raise BuildError(last_event or 'Unknown', result_stream)

        return result, gen()

Maybe this could be switched-on by adding an option to build for example stream_output ?

And this can be used like this :

    image, output_iter = docker_client.images.build(fileobj=fd, tag=args.tag)

    for line in output_iter:
        print(line)

    print(image)

Of course the nicer solution for that would be to use asyncio but that's then becoming more complicated :P

shin- commented 5 years ago

The original design was that because it was the high-level client, the result (image object or exception in case of failure) was more important to the user than the logs - and if that wasn't the case, one could still use the low-level API.

As you remark, it's been a recurring theme on this tracker for a while now, and we've tried to compromise somewhat already by returning the logs after the operation finishes. Maybe we'll look at doing something better in 4.0.

sebpiq commented 5 years ago

I understand the idea between this division, it's just that I feels silly to duplicate the high-level API source code in order to parse my output from the low-level API (I want to get the image id). Maybe extracting the parsing from the high-level API and making it available as some sort of utils would be useful then ( output event -> parsed event<Error|ImageId|Log> ) ?

sebpiq commented 5 years ago

Ah nevermind, I don't actually need to parse, I can just use client.images.get(name) no need for image id

sebpiq commented 5 years ago

thanks for the great library by the way :) loving it

haizaar commented 5 years ago

Hit this one as well - need to show logs as they come. Ended up using low level API: from_env().api.build()

vladak commented 1 year ago

The original design was that because it was the high-level client, the result (image object or exception in case of failure) was more important to the user than the logs - and if that wasn't the case, one could still use the low-level API.

As you remark, it's been a recurring theme on this tracker for a while now, and we've tried to compromise somewhat already by returning the logs after the operation finishes. Maybe we'll look at doing something better in 4.0.

The problem I find with this approach is that if the build fails for some reason, I'd like to get the logs to determine the reason, however that is not possible:

  try:
      image, live_log_generator = client.images.build(path=".", tag="master")
  except docker.errors.BuildError as e:
      logger.error(e)
      for line in live_log_generator:  # oops, live_log_generator is unassigned
          logger.info(line)

vladak commented 1 year ago

The other thing is that with using the low level build API, one is basically forced to reimplement the gory-details of docker/models/images.py#build(), i.e. matching the regex so that image ID can be determined (in order to construct the Image instance) and grep-ing for error patterns. If one wants to avoid that, it is necessary to resort to ugly workarounds like this:

  try:
      image, _ = client.images.build(path=".", tag="master")
  except docker.errors.BuildError as e:
      logger.error(e)
      # Rebuild the image using low level API, hoping the failure is reproducible.
      live_log_generator = client.api.build(path=".", tag="master")
      for line in live_log_generator:
          line_dict = json.loads(line)
          if line_dict.get("stream"):
              logger.error(line_dict["stream"].rstrip())

which doubles the time needed to diagnose the problem.

Tranquility2 commented 6 months ago

Thanks @haizaar and @shin- this tread old but gold :)

docker / docker-py

Is there a reason why `client.images.build` doesn't stream output logs ? #2182