Closed keyskull closed 6 months ago
Is this a production worker or being run in wrangler? Does this always happen with the same image (or of similar characteristics) or have you observed this happening with more?
This is in a production worker environment. And it will happen unexpectedly with different images.
When it shows up every time, it will be created one or more incomplete Ongoing Multipart Upload Objects on R2.
Then the object is not available to continue uploading by re-run the push command instead will upload another object to complete the whole image layers uploading.
Thanks! I will take a look and see if I can repro locally
Maybe we made some wrong assumption about docker push here. Does this help as an env variable in your worker?
PUSH_COMPATIBILITY_MODE=full
What is printed in your client with:
docker version
You can also check the state of the upload in the UPLOADS kv. You can search the ID in the R2 bucket. Any ongoing upload in the R2 bucket is the one that failed, if we take that ID and search for it in the KV we could retrieve the upload state.
Maybe we made some wrong assumption about docker push here. Does this help as an env variable in your worker?
PUSH_COMPATIBILITY_MODE=full
What is printed in your client with:
docker version
This is my docker version
Client:
Cloud integration: v1.0.35-desktop+001
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:36:24 2023
OS/Arch: windows/amd64
Context: default
Server: Docker Desktop 4.22.1 (118664)
Engine:
Version: 24.0.5
API version: 1.43 (minimum version 1.12)
Go version: go1.20.6
Git commit: a61e2b4
Built: Fri Jul 21 20:35:45 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
I will check up with the env PUSH_COMPATIBILITY_MODE later.
You can also check the state of the upload in the UPLOADS kv. You can search the ID in the R2 bucket. Any ongoing upload in the R2 bucket is the one that failed, if we take that ID and search for it in the KV we could retrieve the upload state.
I can see the record in KV, but the retrieve process seems haven't done well.
The value of the record is a jwt. If you decode it in a tool like jwt.io, what's the state that appears?
For my initial case, I have two objects are Ongoing Multipart Upload:
{
"uploadId": "AGhrTbnEB0AC3xYiEdqWwMvNtcnQaNFTu1r6PXjyKune1DmqnJNF28usMno_vPjvd4yNj3JgGi0sURfXY_7Ig5_sWbxf1sqxBqdKhsQoJacX47wIhuGnJQRvT97Q4Ms6MgMPGfuqoVmSEZpfommiqa_AY81uNkDNot_NY937mDPTrhbhjvRz9l9uP1zIhENKZDVZkIK0N9Mwibg1ZEmSN8GiGUfJFkkMPYTnsuKJJOoftk7p4wJjwLDuOBye2cz8peKGs2oJD_S9Aj_KMLq5bFlWavNyZNfZ3RNYO65ph2b7mu0NjSytJrBrSNwnw8mHswztg2UZATvCnyn2dBqHdRU",
"parts": [
{
"partNumber": 1,
"etag": "739350224836ba140338b0f3eb1b1452"
}
],
"registryUploadId": "0969706f-013a-48fc-a405-1ee623c1e14c",
"byteRange": 11112020,
"name": "cms-api",
"chunks": [
{
"type": "multi-part-chunk",
"size": 11112020,
"uploadId": "0969706f-013a-48fc-a405-1ee623c1e14c"
}
],
"exp": 1709500969,
"iat": 1709499762
}
{
"uploadId": "ABwfoAMUtcdd28pbMfuAU9VVQKqPwpklt4RWK76xbOaQQNyBSNQTSmzvukigNzn_rFUR699LgTVp8fyho9mvdQHSuVaLEvE8sJjkXHcCMHNmdOHApRSZdmx7tT-wmjsZyyCc1rTPAtEx2UV2OuCIUJ3UxcdMqu0RYynYjbj1Ge05d9qYrLJHpvwYg4doy7WAYZrjbSCQe7dV_mhjJfQPQHRWFJuNblg0IVbhmfhEUP5ei4pJmBFm6cvjzsq5UWZR3806OYbYYukxPp25MCw_zvkRasLuNh2BFpnS1NUG9h1n3EEvcp7qjIE2obXbYRcOmx0apgf0lX-NtI_h1YkTJJY",
"parts": [],
"registryUploadId": "3ee9b027-75dd-4435-9a25-38a4980312cc",
"byteRange": 0,
"name": "cms-api",
"chunks": [],
"exp": 1709500955,
"iat": 1709499755
}
I've updated with a commit in main that adds more context to the error if you want to deploy a new version in the meantime
Thanks!
Alright, I finally tested out the error msg!
5f70bf18a086: Pushed
3d6c675388d4: Pushed
818595c560de: Pushed
d25de9d69a33: Pushed
cdd7c7392317: Pushed
unknown: {"errors":[{"code":"RANGE_ERROR","message":"state eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1cGxvYWRJZCI6IkFQa1FJSU9jZnF0MndzTVVfa0ZKdFJxQnpYS3hYZ1lmMzhhQTVWN05fRVNPTlBNMDhzTWdYQVU2TTliQ3ZXQ3BpeTFOQUdHdHRkaVVBZ3hjUUtFZnF1blQ3Wk9rNUY5TUVNc0ZxR19YSGlGb2RYOE1SVkdVNjZWY3BzeVNsTXRrU1k5dE9rYVJlWXl2Sk82SnRKMUd4bzR6enlITnpabms4TnJDR01RQ1o4V2NrUC1obndjYWtRTHN0MFQwbFp3eVlDbW0tY0lmdWF6RFhSX1c2TXpPOHhwVEVES21WUmQtSHF3MXhyTFUzWE5OdWhiemg5ODI2VkxKSUxMUjBhV3VFSGZoVmp4V1ZGLUJ2WkRkdGp1QlF4Z3JkbE9RMVJOTFB2YmgtNjV0WWo0TXZXdlI0elRweWs5dS1hOU9ub2hEbUtCS2ttbmx2V3NTTjlKR1BmSS1IRDQiLCJwYXJ0cyI6W10sInJlZ2lzdHJ5VXBsb2FkSWQiOiI3NmU3ODkxYy0xZjA4LTRiMDktODdkOC1jNjllZWRlZmE5YzciLCJieXRlUmFuZ2UiOjAsIm5hbWUiOiJncm91cCIsImNodW5rcyI6W10sImV4cCI6MTcwOTU4NTQxMywiaWF0IjoxNzA5NTg0MjEzfQ.sNWeXBF7fTKomNhzGzcYdziLhTe7w0gCntdRmXN67vQ is not satisfiable (upload id: 76e7891c-1f08-4b09-87d8-c69eedefa9c7)","detail":{"uploadId":"APkQIIOcfqt2wsMU_kFJtRqBzXKxXgYf38aA5V7N_ESONPM08sMgXAU6M9bCvWCpiy1NAGGttdiUAgxcQKEfqunT7ZOk5F9MEMsFqG_XHiFodX8MRVGU66VcpsySlMtkSY9tOkaReYyvJO6JtJ1Gxo4zzyHNzZnk8NrCGMQCZ8WckP-hnwcakQLst0T0lZwyYCmm-cIfuazDXR_W6MzO8xpTEDKmVRd-Hqw1xrLU3XNNuhbzh9826VLJILLR0aWuEHfhVjxWVF-BvZDdtjuBQxgrdlOQ1RNLPvbh-65tYj4MvWvR4zTpyk9u-a9OnohDmKBKkmnlvWsSN9JGPfI-HD4","parts":[],"registryUploadId":"76e7891c-1f08-4b09-87d8-c69eedefa9c7","byteRange":0,"name":"group","chunks":[],"exp":1709585413,"iat":1709584213,"string":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1cGxvYWRJZCI6IkFQa1FJSU9jZnF0MndzTVVfa0ZKdFJxQnpYS3hYZ1lmMzhhQTVWN05fRVNPTlBNMDhzTWdYQVU2TTliQ3ZXQ3BpeTFOQUdHdHRkaVVBZ3hjUUtFZnF1blQ3Wk9rNUY5TUVNc0ZxR19YSGlGb2RYOE1SVkdVNjZWY3BzeVNsTXRrU1k5dE9rYVJlWXl2Sk82SnRKMUd4bzR6enlITnpabms4TnJDR01RQ1o4V2NrUC1obndjYWtRTHN0MFQwbFp3eVlDbW0tY0lmdWF6RFhSX1c2TXpPOHhwVEVES21WUmQtSHF3MXhyTFUzWE5OdWhiemg5ODI2VkxKSUxMUjBhV3VFSGZoVmp4V1ZGLUJ2WkRkdGp1QlF4Z3JkbE9RMVJOTFB2YmgtNjV0WWo0TXZXdlI0elRweWs5dS1hOU9ub2hEbUtCS2ttbmx2V3NTTjlKR1BmSS1IRDQiLCJwYXJ0cyI6W10sInJlZ2lzdHJ5VXBsb2FkSWQiOiI3NmU3ODkxYy0xZjA4LTRiMDktODdkOC1jNjllZWRlZmE5YzciLCJieXRlUmFuZ2UiOjAsIm5hbWUiOiJncm91cCIsImNodW5rcyI6W10sImV4cCI6MTcwOTU4NTQxMywiaWF0IjoxNzA5NTg0MjEzfQ.sNWeXBF7fTKomNhzGzcYdziLhTe7w0gCntdRmXN67vQ"}}]}
{
"uploadId": "APkQIIOcfqt2wsMU_kFJtRqBzXKxXgYf38aA5V7N_ESONPM08sMgXAU6M9bCvWCpiy1NAGGttdiUAgxcQKEfqunT7ZOk5F9MEMsFqG_XHiFodX8MRVGU66VcpsySlMtkSY9tOkaReYyvJO6JtJ1Gxo4zzyHNzZnk8NrCGMQCZ8WckP-hnwcakQLst0T0lZwyYCmm-cIfuazDXR_W6MzO8xpTEDKmVRd-Hqw1xrLU3XNNuhbzh9826VLJILLR0aWuEHfhVjxWVF-BvZDdtjuBQxgrdlOQ1RNLPvbh-65tYj4MvWvR4zTpyk9u-a9OnohDmKBKkmnlvWsSN9JGPfI-HD4",
"parts": [],
"registryUploadId": "76e7891c-1f08-4b09-87d8-c69eedefa9c7",
"byteRange": 0,
"name": "group",
"chunks": [],
"exp": 1709585413,
"iat": 1709584213
}
So the code is a RANGE_ERROR.
I am starting to have a good guess of what's going on then! Will provide a branch for you to try out tomorrow maybe, will see how it goes.
https://github.com/cloudflare/serverless-registry/compare/gv/r2-instead?expand=1
Could you try out this branch?
Sorry for the late response. It seems the changes made other problems show up.
5f70bf18a086: Retrying in 15 seconds
b3636174f992: Retrying in 11 seconds
34aad3fbfe4f: Pushing [==================================================>] 30.12MB
1f32df1e1b28: Pushing [==================================================>] 30.12MB
cdd7c7392317: Retrying in 8 seconds
it keeps retrying to push the image layers to the server but fails.
then received unexpected HTTP status: 500 Internal Server Error
With worker logs:
Hello @keyskull, I added a change to the branch, I tested both locally and prod and the push seems to work now. Can you try a repro?
Hello @keyskull, I added a change to the branch, I tested both locally and prod and the push seems to work now. Can you try a repro?
It looks like the issue has been resolved, I've pushed 10 times and still haven't seen the error.
@keyskull Thanks for the bug submission! We will try to merge the fix today into main.
Sometimes the worker would unexpectedly return a 416 error when pushing images.