acorn-io / runtime

A simple application deployment framework built on Kubernetes
https://docs.acorn.io/
Apache License 2.0
1.13k stars 101 forks source link

IAR - Timeout trying to deploy Auto-Upgrade pattern where image has lots of matching tags #2155

Open sangee2004 opened 1 year ago

sangee2004 commented 1 year ago

acorn version - v0.8.0-13-gab9787b8+ab9787b8

Steps to reproduce the problem:

  1. Enable image allow rules - acorn install --features image-allow-rules=true
  2. Create 1 IAR (It can use the actual public key , gh identity or acorn identity for public keys)
  3. Deploy app with image having autoupgrade format - acorn run -n mytestnew docker.io/sangeetha/myfirstacorn:v#.#.#

Command never returns and on force quitting we see the following error message:

acorn run -n mytestnew docker.io/sangeetha/myfirstacorn:v#.#.# 
^C  ✗  ERROR:  Post "https://127.0.0.1:6443/apis/api.acorn.io/v1/namespaces/acorn/images/_/details": context canceled
iwilltry42 commented 1 year ago

Hi @sangee2004 , can you please re-evaluate this? I can't seem to replicate what you're seeing here.

sangee2004 commented 1 year ago

@iwilltry42 This issue seems to specific to this 1 image that has lot of tags. If i leave the command to run long enough, it times out

acorn run -n mytestnew docker.io/sangeetha/myfirstacorn:v#.#.# 
  ✗  ERROR:  Timeout: request did not complete within requested timeout - context deadline exceeded
iwilltry42 commented 1 year ago

Oh okay, now I understand the problem. At this point (https://github.com/acorn-io/runtime/blob/7eff4103aae28775565453ee0dd97f19fae8e09d/pkg/cli/run.go#L290) we're ending up with a call to ImageDetails, which will try to resolve the auto-upgrade pattern and (since the pattern probably includes a lot of tags, we'll evaluate the IARs a lot of times, leading to significant overhead in processing. I don't know what's the timeout for all of this, since I couldn't find a place where we're explicitly setting this though.

The only solutions for this I can think of right now are:

  1. Increase the timeout or provide a flag for the user to set a timeout
  2. Workaround: Have the user provide a stricter limit for the tag pattern (to include less images)
  3. Drop IAR evaluation from the App validation (which would lead to some nasty side-effects)

@ibuildthecloud I know you love this stuff just as much as me - what do you think?

iwilltry42 commented 1 year ago

As per Slack, the decision is to use the timeout error to display a nicer error message. This can be considered an "expected issue" when using auto-upgrade with lots of recent "bad tags".