Azure / mec-app-solution-accelerator

Application Solution Accelerator for Multi-access Edge Compute (MEC)
MIT License
39 stars 11 forks source link

[P1] Error in INFERENCE model pod - NUC-BCN #127

Closed CESARDELATORRE closed 7 months ago

CESARDELATORRE commented 7 months ago

This is happening in the BCN NUC:

We need to investigate this...

Readiness probe failed: Get "http://10.1.0.20:3501/v1.0/healthz": dial tcp 10.1.0.20:3501: connect: connection refused Node is not ready Back-off restarting failed container daprd in pod inference-model-56fc47754c-qnls8_mec-accelerator(a62732ad-e6f0-43b9-ac06-77dd747974b6)

image

In the k8s LOGs, but this might be different issue: It's related to the model YOLO being downloaded...

/usr/local/lib/python3.8/site-packages/torch/hub.py:267: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour warnings.warn( Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /root/.cache/torch/hub/master.zip Matplotlib is building the font cache; this may take a moment. requirements: Ultralytics requirements ['ultralytics>=8.0.232', 'setuptools>=65.5.1'] not found, attempting AutoUpdate...

CESARDELATORRE commented 7 months ago

UPDATE:

I re-started the model-inference DEPLOYMENT in k8s... and looks like it started goo now...

I think this is related to lack of resources.. CPU, RAM, etc...

CESARDELATORRE commented 7 months ago

Closing this BUG, I haven't seen it in machines with resources enough...