Azure / azure-cli

Azure Command-Line Interface
MIT License
3.97k stars 2.95k forks source link

Az acr helm push - You have tried to upload a chart that already exists #15110

Open bsuchorowski opened 4 years ago

bsuchorowski commented 4 years ago

Describe the bug From time to time (a few times a day) once we do helm push to container registry: az acr helm push --name ${{parameters.containerRegistry}} $(Build.ArtifactStagingDirectory)/${{parameters.imageRepository}}-$(Build.BuildNumber).tgz --force

we receive Error: You have tried to upload a chart that already exists. Correlation ID: 65237304-9a53-4ff4-93a3-0f59441f35ef. event though there is no way to exists. We checked it by: az acr helm list -n ${{parameters.containerRegistry}} --query "ebox[?version=='$(Build.BuildNumber)']" and it returns 0 rows.

Issue comes and goes - once we retry a build it usally passes without issues. I assue that root cause of this exception is somewhere else (connectivity issues, parallel pushes of other charts to the same repository?). Keep in mind that we use --force that should allow us to push chart even if it indeed existed.

To Reproduce Run many times: az acr helm push --name ${{parameters.containerRegistry}} $(Build.ArtifactStagingDirectory)/${{parameters.imageRepository}}-$(Build.BuildNumber).tgz --force

Expected behavior Helm chart pushed without issues

Environment summary

azure-cli 2.7.0 ommand-modules-nspkg 2.0.3 core 2.7.0 nspkg 3.0.4 telemetry 1.0.4 *

Extensions: azure-devops 0.18.0 azure-cli-iot-ext 0.8.9

ghost commented 4 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @toddysm, @yugangw-MSFT.

yonzhan commented 4 years ago

acr

yugangw-msft commented 4 years ago

@shizhMSFT any known reliability issues here? This is on helm2.

yugangw-msft commented 4 years ago

@bsuchorowski, thanks for reporting this. We confirmed there was increased latency caused by Azure storage service today which was mitigated a few hours ago. Please reactivated the issue if you still experience it

bsuchorowski commented 4 years ago
  1. Yesterday it was very often indeed but it has happened from time to time for weeks.
  2. Should issues with increased latency (timeout exception) be logged in any other way than "chart already exists"? Especially when I am using --force flag.
bsuchorowski commented 4 years ago

@yugangw-msft it is still happening. Today we had same issues.

brunooliveiramac commented 4 years ago

I'm facing this issue as well even with --force flag Error: You have tried to upload a chart that already exists. Correlation ID: b11fbaaa-0033-4756-9521-8fca29c24f9a.

yugangw-msft commented 4 years ago

I am reactivating this issue and and will update once the latency issues get addressed. The flakiness of helm push appears a different issue, CC @shizhMSFT

mateustanaka commented 3 years ago

@yugangw-msft We are facing the same instability issue a few days already to perform az acr helm push even with --force and retries when failing. I know that as of Helm 3, az acr helm commands for use with the Helm 2 client are being deprecated, but this could not impact current projects that are using helm 2. CC @brunooliveiramac

jikuma commented 3 years ago

Here is how force option works. First 'helm push' sends uploadChartPackage request as PATCH call. If the chart already exists, its not a problem and this issue will not happen. Chart will be pushed and we are done.

Now if the chart does not exists it will return 404 error, which is as expected. Then cli makes PUT call, PUT call can be only made for chats which does not exists. Cli assumes that since last PATCH call returned 404 chart will not exists.

Now this PUT call is taking more that 5 mins to return because of latency issue. Hence cli is retrying the operation as last operation timed out is set as 5 mins. The second put call finds that chart already exists hence the error "Chart already exists"

One way to fix this is to make sure that you call force option only for charts which you intentionally want to override. Also you can decrease the time to push chart if the size of your helm repository is small. Consider deleting old unused charts.

Also recently we faced some latency issue in west Europe region which can contribute to this error.

mateustanaka commented 3 years ago

@jikuma Thanks for your explanation. Do you know if this latency issue in west Europe has been resolved? Today we started facing timeout errors to push helm charts to our ACR and just worked after some retries.

HTTPSConnectionPool(host='xxxxxx.azurecr.io', port=443): Read timed out. (read timeout=300)

christle commented 3 years ago

Hi, We have had the problem again for a few days in west europe. Is it a known issue?

JackSinclairT commented 3 years ago

We've had the same issue as @christle. What's going on?

yugangw-msft commented 3 years ago

This should be mitigated now. We have a perf degrading starting ~9am UTC. Please update here or submit a support ticket if you haven't see improvements.

christle commented 3 years ago

We are facing this problem again and again. For weeks. Is there any kind of longterm solution for this?

mlkiefer commented 3 years ago

Deleting charts did not help us. Still seeing the problem

christle commented 3 years ago

yesterday, we reduced our total amount of helm charts from 15000 to 2000. In the evening all went back to normal and we are able to push to the registry. But today, we are facing the same problem. So decreasing the total count has no effect. Our repository runs on west europe.

thegreatdane6 commented 3 years ago

i am also having issues with that. Cleaning the charts did not help :(

JackSinclairT commented 3 years ago

We are facing this issue as well

msilb commented 11 months ago

We are facing the same issue. Is there any update?