cloudflare / serverless-registry

A Docker registry backed by Workers and R2.
Apache License 2.0
680 stars 25 forks source link

Missing push error message. #20

Closed keyskull closed 6 months ago

keyskull commented 6 months ago
5f70bf18a086: Pushed
30170ad1cf54: Pushed
b57dae4cd99e: Pushing [==================================================>]  34.36MB
c494be6f7d61: Pushed
cdd7c7392317: Pushing [=================================================> ]  77.21MB/77.81MB
error parsing HTTP 416 response body: no error details found in HTTP response body: "{}"

Sometimes the worker would unexpectedly return a 416 error when pushing images.

gabivlj commented 6 months ago

Is this a production worker or being run in wrangler? Does this always happen with the same image (or of similar characteristics) or have you observed this happening with more?

keyskull commented 6 months ago

This is in a production worker environment. And it will happen unexpectedly with different images. When it shows up every time, it will be created one or more incomplete Ongoing Multipart Upload Objects on R2.
Then the object is not available to continue uploading by re-run the push command instead will upload another object to complete the whole image layers uploading.

gabivlj commented 6 months ago

Thanks! I will take a look and see if I can repro locally

gabivlj commented 6 months ago

Maybe we made some wrong assumption about docker push here. Does this help as an env variable in your worker?

PUSH_COMPATIBILITY_MODE=full

What is printed in your client with:

docker version 
gabivlj commented 6 months ago

You can also check the state of the upload in the UPLOADS kv. You can search the ID in the R2 bucket. Any ongoing upload in the R2 bucket is the one that failed, if we take that ID and search for it in the KV we could retrieve the upload state.

keyskull commented 6 months ago

Maybe we made some wrong assumption about docker push here. Does this help as an env variable in your worker?

PUSH_COMPATIBILITY_MODE=full

What is printed in your client with:

docker version 

This is my docker version

Client:
 Cloud integration: v1.0.35-desktop+001
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:36:24 2023
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.22.1 (118664)
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:45 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I will check up with the env PUSH_COMPATIBILITY_MODE later.

keyskull commented 6 months ago

You can also check the state of the upload in the UPLOADS kv. You can search the ID in the R2 bucket. Any ongoing upload in the R2 bucket is the one that failed, if we take that ID and search for it in the KV we could retrieve the upload state.

I can see the record in KV, but the retrieve process seems haven't done well.

gabivlj commented 6 months ago

The value of the record is a jwt. If you decode it in a tool like jwt.io, what's the state that appears?

keyskull commented 6 months ago

For my initial case, I have two objects are Ongoing Multipart Upload:

  1. 0969706f-013a-48fc-a405-1ee623c1e14c
    {
    "uploadId": "AGhrTbnEB0AC3xYiEdqWwMvNtcnQaNFTu1r6PXjyKune1DmqnJNF28usMno_vPjvd4yNj3JgGi0sURfXY_7Ig5_sWbxf1sqxBqdKhsQoJacX47wIhuGnJQRvT97Q4Ms6MgMPGfuqoVmSEZpfommiqa_AY81uNkDNot_NY937mDPTrhbhjvRz9l9uP1zIhENKZDVZkIK0N9Mwibg1ZEmSN8GiGUfJFkkMPYTnsuKJJOoftk7p4wJjwLDuOBye2cz8peKGs2oJD_S9Aj_KMLq5bFlWavNyZNfZ3RNYO65ph2b7mu0NjSytJrBrSNwnw8mHswztg2UZATvCnyn2dBqHdRU",
    "parts": [
    {
      "partNumber": 1,
      "etag": "739350224836ba140338b0f3eb1b1452"
    }
    ],
    "registryUploadId": "0969706f-013a-48fc-a405-1ee623c1e14c",
    "byteRange": 11112020,
    "name": "cms-api",
    "chunks": [
    {
      "type": "multi-part-chunk",
      "size": 11112020,
      "uploadId": "0969706f-013a-48fc-a405-1ee623c1e14c"
    }
    ],
    "exp": 1709500969,
    "iat": 1709499762
    }
  2. 3ee9b027-75dd-4435-9a25-38a4980312cc
    {
    "uploadId": "ABwfoAMUtcdd28pbMfuAU9VVQKqPwpklt4RWK76xbOaQQNyBSNQTSmzvukigNzn_rFUR699LgTVp8fyho9mvdQHSuVaLEvE8sJjkXHcCMHNmdOHApRSZdmx7tT-wmjsZyyCc1rTPAtEx2UV2OuCIUJ3UxcdMqu0RYynYjbj1Ge05d9qYrLJHpvwYg4doy7WAYZrjbSCQe7dV_mhjJfQPQHRWFJuNblg0IVbhmfhEUP5ei4pJmBFm6cvjzsq5UWZR3806OYbYYukxPp25MCw_zvkRasLuNh2BFpnS1NUG9h1n3EEvcp7qjIE2obXbYRcOmx0apgf0lX-NtI_h1YkTJJY",
    "parts": [],
    "registryUploadId": "3ee9b027-75dd-4435-9a25-38a4980312cc",
    "byteRange": 0,
    "name": "cms-api",
    "chunks": [],
    "exp": 1709500955,
    "iat": 1709499755
    }
gabivlj commented 6 months ago

I've updated with a commit in main that adds more context to the error if you want to deploy a new version in the meantime

keyskull commented 6 months ago

Thanks!

keyskull commented 6 months ago

Alright, I finally tested out the error msg!

5f70bf18a086: Pushed
3d6c675388d4: Pushed
818595c560de: Pushed
d25de9d69a33: Pushed
cdd7c7392317: Pushed
unknown: {"errors":[{"code":"RANGE_ERROR","message":"state eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1cGxvYWRJZCI6IkFQa1FJSU9jZnF0MndzTVVfa0ZKdFJxQnpYS3hYZ1lmMzhhQTVWN05fRVNPTlBNMDhzTWdYQVU2TTliQ3ZXQ3BpeTFOQUdHdHRkaVVBZ3hjUUtFZnF1blQ3Wk9rNUY5TUVNc0ZxR19YSGlGb2RYOE1SVkdVNjZWY3BzeVNsTXRrU1k5dE9rYVJlWXl2Sk82SnRKMUd4bzR6enlITnpabms4TnJDR01RQ1o4V2NrUC1obndjYWtRTHN0MFQwbFp3eVlDbW0tY0lmdWF6RFhSX1c2TXpPOHhwVEVES21WUmQtSHF3MXhyTFUzWE5OdWhiemg5ODI2VkxKSUxMUjBhV3VFSGZoVmp4V1ZGLUJ2WkRkdGp1QlF4Z3JkbE9RMVJOTFB2YmgtNjV0WWo0TXZXdlI0elRweWs5dS1hOU9ub2hEbUtCS2ttbmx2V3NTTjlKR1BmSS1IRDQiLCJwYXJ0cyI6W10sInJlZ2lzdHJ5VXBsb2FkSWQiOiI3NmU3ODkxYy0xZjA4LTRiMDktODdkOC1jNjllZWRlZmE5YzciLCJieXRlUmFuZ2UiOjAsIm5hbWUiOiJncm91cCIsImNodW5rcyI6W10sImV4cCI6MTcwOTU4NTQxMywiaWF0IjoxNzA5NTg0MjEzfQ.sNWeXBF7fTKomNhzGzcYdziLhTe7w0gCntdRmXN67vQ is not satisfiable (upload id: 76e7891c-1f08-4b09-87d8-c69eedefa9c7)","detail":{"uploadId":"APkQIIOcfqt2wsMU_kFJtRqBzXKxXgYf38aA5V7N_ESONPM08sMgXAU6M9bCvWCpiy1NAGGttdiUAgxcQKEfqunT7ZOk5F9MEMsFqG_XHiFodX8MRVGU66VcpsySlMtkSY9tOkaReYyvJO6JtJ1Gxo4zzyHNzZnk8NrCGMQCZ8WckP-hnwcakQLst0T0lZwyYCmm-cIfuazDXR_W6MzO8xpTEDKmVRd-Hqw1xrLU3XNNuhbzh9826VLJILLR0aWuEHfhVjxWVF-BvZDdtjuBQxgrdlOQ1RNLPvbh-65tYj4MvWvR4zTpyk9u-a9OnohDmKBKkmnlvWsSN9JGPfI-HD4","parts":[],"registryUploadId":"76e7891c-1f08-4b09-87d8-c69eedefa9c7","byteRange":0,"name":"group","chunks":[],"exp":1709585413,"iat":1709584213,"string":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1cGxvYWRJZCI6IkFQa1FJSU9jZnF0MndzTVVfa0ZKdFJxQnpYS3hYZ1lmMzhhQTVWN05fRVNPTlBNMDhzTWdYQVU2TTliQ3ZXQ3BpeTFOQUdHdHRkaVVBZ3hjUUtFZnF1blQ3Wk9rNUY5TUVNc0ZxR19YSGlGb2RYOE1SVkdVNjZWY3BzeVNsTXRrU1k5dE9rYVJlWXl2Sk82SnRKMUd4bzR6enlITnpabms4TnJDR01RQ1o4V2NrUC1obndjYWtRTHN0MFQwbFp3eVlDbW0tY0lmdWF6RFhSX1c2TXpPOHhwVEVES21WUmQtSHF3MXhyTFUzWE5OdWhiemg5ODI2VkxKSUxMUjBhV3VFSGZoVmp4V1ZGLUJ2WkRkdGp1QlF4Z3JkbE9RMVJOTFB2YmgtNjV0WWo0TXZXdlI0elRweWs5dS1hOU9ub2hEbUtCS2ttbmx2V3NTTjlKR1BmSS1IRDQiLCJwYXJ0cyI6W10sInJlZ2lzdHJ5VXBsb2FkSWQiOiI3NmU3ODkxYy0xZjA4LTRiMDktODdkOC1jNjllZWRlZmE5YzciLCJieXRlUmFuZ2UiOjAsIm5hbWUiOiJncm91cCIsImNodW5rcyI6W10sImV4cCI6MTcwOTU4NTQxMywiaWF0IjoxNzA5NTg0MjEzfQ.sNWeXBF7fTKomNhzGzcYdziLhTe7w0gCntdRmXN67vQ"}}]}
{
  "uploadId": "APkQIIOcfqt2wsMU_kFJtRqBzXKxXgYf38aA5V7N_ESONPM08sMgXAU6M9bCvWCpiy1NAGGttdiUAgxcQKEfqunT7ZOk5F9MEMsFqG_XHiFodX8MRVGU66VcpsySlMtkSY9tOkaReYyvJO6JtJ1Gxo4zzyHNzZnk8NrCGMQCZ8WckP-hnwcakQLst0T0lZwyYCmm-cIfuazDXR_W6MzO8xpTEDKmVRd-Hqw1xrLU3XNNuhbzh9826VLJILLR0aWuEHfhVjxWVF-BvZDdtjuBQxgrdlOQ1RNLPvbh-65tYj4MvWvR4zTpyk9u-a9OnohDmKBKkmnlvWsSN9JGPfI-HD4",
  "parts": [],
  "registryUploadId": "76e7891c-1f08-4b09-87d8-c69eedefa9c7",
  "byteRange": 0,
  "name": "group",
  "chunks": [],
  "exp": 1709585413,
  "iat": 1709584213
}

So the code is a RANGE_ERROR.

gabivlj commented 6 months ago

I am starting to have a good guess of what's going on then! Will provide a branch for you to try out tomorrow maybe, will see how it goes.

gabivlj commented 6 months ago

https://github.com/cloudflare/serverless-registry/compare/gv/r2-instead?expand=1

Could you try out this branch?

keyskull commented 6 months ago

Sorry for the late response. It seems the changes made other problems show up.

5f70bf18a086: Retrying in 15 seconds
b3636174f992: Retrying in 11 seconds
34aad3fbfe4f: Pushing [==================================================>]  30.12MB
1f32df1e1b28: Pushing [==================================================>]  30.12MB
cdd7c7392317: Retrying in 8 seconds

it keeps retrying to push the image layers to the server but fails. then received unexpected HTTP status: 500 Internal Server Error

keyskull commented 6 months ago

With worker logs: image

gabivlj commented 6 months ago

Hello @keyskull, I added a change to the branch, I tested both locally and prod and the push seems to work now. Can you try a repro?

keyskull commented 6 months ago

Hello @keyskull, I added a change to the branch, I tested both locally and prod and the push seems to work now. Can you try a repro?

It looks like the issue has been resolved, I've pushed 10 times and still haven't seen the error.

gabivlj commented 6 months ago

@keyskull Thanks for the bug submission! We will try to merge the fix today into main.