c0c0n3 / kitt4sme.live

On a mission to bring AI to the shop floor: https://kitt4sme.eu/
MIT License
1 stars 28 forks source link

Can't pull FAMS image #233

Closed c0c0n3 closed 1 year ago

c0c0n3 commented 1 year ago

Describe the bug

232 deployed the latest FAMS with updated credentials to pull the image from SUPSI's GitLab server. The deployment and the docker secret to pull the image are in place and work locally, but in the VTT cloud K8s gets stuck on pulling the image.

To Reproduce

If you look at the K8s events in our live cluster, you should see K8s fail to pull the image

31s         Normal    Pulling             pod/fams-5f6c9cf4f8-lcwx2    Pulling image "gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s"
1s          Warning   Failed              pod/fams-5f6c9cf4f8-lcwx2    Failed to pull image "gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s": rpc error: code = Unknown desc = failed to pull and unpack image "gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s": failed to resolve reference "gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s": failed to do request: Head "https://gitlab-core.supsi.ch:5050/v2/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams/manifests/0.1.2-k4s": dial tcp 193.5.152.67:5050: i/o timeout

It looks like K8s isn't able to pull the image within 30 secs and it gives up. But I don't think it's a K8s problem. In fact, if you SSH into the VTT box and try logging onto SUPSI's GitLab Docker registry, you'll get a similar timeout error

$ sudo docker login https://gitlab-core.supsi.ch:5050
Username: gitlab+deploy-fams-k4s-cloud
Password: 
Error response from daemon: Get https://gitlab-core.supsi.ch:5050/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Notice the VTT server is located in Finland. Surprisingly, if you do the same from a machine located in Switzerland, you'll be able to pull the image, no prob

$ docker login https://gitlab-core.supsi.ch:5050
Username: gitlab+deploy-fams-k4s-cloud
Password: 
Login Succeeded

$ docker pull gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s
0.1.2-k4s: Pulling from dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams
...
Digest: sha256:9952389d3f41f00e0472965f4f11277c7c68095d4d8b27e161e26505dd69351f
Status: Downloaded newer image for gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s
gitlab-core.supsi.ch:5050/dti-isteps/spslab/human-robot-interaction/human-digital-twin/fams:0.1.2-k4s

As @vcutrona noted this could well be a geofencing issue with the SUPSI server.

Expected behavior

If you provide the right credentials, the SUPSI server lets you pull the FAMS image from any location or at least any location in Western Europe.

Additional context

232

vcutrona commented 1 year ago

Hi @c0c0n3! SUPSI solved this issue and now you should be able to pull the Docker image from SUPSI's GitLab server.

c0c0n3 commented 1 year ago

hi @vcutrona :-)

able to pull the Docker image from SUPSI's GitLab server

we can, indeed! FAMS has been running happily for a while now

$ kubectl logs deployment/fams

...
2023-04-24 10:19:00.398 | INFO     | fams.core:run:241 - Timestamp --> 2023-04-24T10:19:00.387275
2023-04-24 10:19:00.406 | DEBUG    | fams.core:run:288 - No entities to update
2023-04-24 10:19:05.417 | INFO     | fams.core:run:241 - Timestamp --> 2023-04-24T10:19:05.408739
2023-04-24 10:19:05.425 | DEBUG    | fams.core:run:288 - No entities to update