Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 457 forks source link

Runtime: Metrics collector cannot be added to garbage collection list #7308

Closed jeremielalanne closed 1 week ago

jeremielalanne commented 2 weeks ago

Expected Behavior

IoT Edge should be able to add Metrics collector to garbage collection list when pulling it.

Current Behavior

Runtime says "[WARN] Could not retrieve image id. mcr.microsoft.com/azureiotedge-metrics-collector:1.0 was not added to image garbage collection list and will not be garbage collected.". + it does not start after that warning. It happens only on one of our devices, and purging all modules and restarting didn't help.

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Have a deployment manifest with one of the modules being metrics collector
  2. Setup garbage collector in config.tom file just like what is written in the production checklist documentation
  3. Start

Output of iotedge check

Click here ``` ```

Device Information

Runtime Versions

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

Sorry for lack of logs, cannot access logs for the edged. Can just say that the warning msg appears right after the successful pull message, and it keeps retrying to pull again and again.

jeremielalanne commented 2 weeks ago

Reboots and purges didn't solve the issue, and at some point it didn't even want to boot anymore so I had to reflash and now it's working properly. If you have any idea what was the issue, i'd love to hear, maybe a system configuration / corrupt?

gauravIoTEdge commented 1 week ago

The issue isn't with iotedge or metrics collector.

iotedge uses the Docker Engine API underneath: https://docs.docker.com/engine/api/v1.45/#tag/Image/operation/ImageList

Think of iotedge as a passthrough for container management.

That's a runtime error from the Docker Engine API. There's nothing we can do.

jeremielalanne commented 1 week ago

Thanks for your answer. Do you have at least any idea what would be the issue? Even if you cannot repair it yourself.

gauravIoTEdge commented 1 week ago

Yeah, when that image is pulled, we call the docker engine API to list the images and create a mapping of image name to image id. Like you listing 'docker images' (that's also the link I pasted above).

For some reason, the docker engine API does not return the image id. We use the image id for garbage collection. Which is quite literally what the error says (so hopefully that adds up).

jeremielalanne commented 1 week ago

Cool that's what I understood of the process which is good news, but do you have an idea of why is that? Or what could I do to find the source of the issue