Closed SniXosha closed 4 years ago
Hi @SniXosha,
I've looked into our code to see where this might potentially fail. I can think of one hypothetical scenario. However, I'm still not convinced of my scenario, because it is unrelated to the level of concurrency; in my scenario, your registry should occasionally fail regardless of whether Jib pushes concurrently or serially. Therefore, my scenario is in conflict with your account that disabling parallel execution seemingly made it more stable. (In any case, my scenario is still not about the fault of Jib but the server not conforming to the HTTP standard.) (UPDATE: turned out to be a bug in Nexus.)
If that is not the case, I am highly suspicious of the server malfunctioning. In the past, we often did see confirmed cases of server issues in some popular registries when there is a very high level of concurrency (e.g., https://github.com/GoogleContainerTools/jib/issues/1986#issuecomment-536905823 and https://github.com/GoogleContainerTools/jib/issues/2013). From our track record, there's a high chance that this is a server issue. (BTW, Docker CLI by default doesn't allow the same level of high concurrency that Jib outputs.)
Anyways, it will become obvious once I can get low-level HTTP request/response logs. Unfortunately, Jib Gradle has an issue generating HTTP logs, so I need your help to build a Jib Gradle plugin from a patched source. Don't be scared, it's very easy to build Jib from source: (UPDATE: patching no longer needed with newer Jib versions.)
$ git clone https://github.com/GoogleContainerTools/jib.git
$ cd jib
$ ./gradlew :jib-gradle-plugin:install
This will build Jib 2.1.1-SNAPSHOT and install it into your local Maven repo (~/.m2/repository
). Then configure your build script to use this SNAPSHOT as explained in https://github.com/GoogleContainerTools/jib/issues/2270#issuecomment-584177904.
Then follow these instructions to capture detailed HTTP logs, except passing -Djib.serialize=true
. Using -Djib.serialize=true
disables parallel pushes, so we don't want that. If everything is set correctly, you'll see logs like
Mar 31, 2020 9:55:52 AM com.google.api.client.http.HttpResponse <init>
CONFIG: -------------- RESPONSE --------------
HTTP/1.1 202 Accepted
Content-Length: 0
Docker-Distribution-Api-Version: registry/2.0
Docker-Upload-Uuid: 6292f0d7-93cb-4a8e-8336-78a1bf7febd2
Location: https://registry-1.docker.io/v2/francium25/test/blobs/uploads/6292f0d7-93cb-4a8e-8336-78a1bf7febd2?_state=6lvUYgy-Xw0N3L5SVgciJGhhUO928fGfHS35zpGIiJx7Ik5hbWUiOiJmcmFuY2l1bTI1L3Rlc3QiLCJVVUlEIjoiNjI5MmYwZDctOTNjYi00YThlLTgzMzYtNzhhMWJmN2ZlYmQyIiwiT2Zmc2V0Ijo2NTcyOTMsIlN0YXJ0ZWRBdCI6IjIwMjAtMDMtMzFUMTM6NTU6NTBaIn0%3D
Range: 0-657292
Date: Tue, 31 Mar 2020 13:55:52 GMT
Strict-Transport-Security: max-age=31536000
Mar 31, 2020 9:55:52 AM com.google.api.client.http.HttpRequest execute
CONFIG: -------------- REQUEST --------------
PUT https://registry-1.docker.io/v2/francium25/test/blobs/uploads/6292f0d7-93cb-4a8e-8336-78a1bf7febd2?_state=6lvUYgy-Xw0N3L5SVgciJGhhUO928fGfHS35zpGIiJx7Ik5hbWUiOiJmcmFuY2l1bTI1L3Rlc3QiLCJVVUlEIjoiNjI5MmYwZDctOTNjYi00YThlLTgzMzYtNzhhMWJmN2ZlYmQyIiwiT2Zmc2V0Ijo2NTcyOTMsIlN0YXJ0ZWRBdCI6IjIwMjAtMDMtMzFUMTM6NTU6NTBaIn0%3D&digest=sha256:24f0c933cbef83faee52f82c7f889c727b1ece5123b92d036c52fa865480f037
Accept:
Accept-Encoding: gzip
Authorization: <Not Logged>
User-Agent: jib 2.1.1-SNAPSHOT jib-maven-plugin Google-HTTP-Java-Client/1.34.0 (gzip)
Please let me know once you get the network logs for the error.
Then configure your build script to use this SNAPSHOT as explained in https://github.com/GoogleContainerTools/jib/issues/2270#issuecomment-584177904.
Oh, depending on how you set up your multi-module project, you may need to adjust the buildscript accordingly.
For example, if you are not applying Jib globally on the root project (that is, you have apply false
as plugins { id '...jib' apply false }
in the root build.gradle
) but instead apply Jib individually in each sub-module, then for applying a SNAPSHOT version, you would
build.gradle
apply plugin
in the root build.gradle
apply plugin
only in each sub-module (in place of plugins { id '...jib' }
)Note to myself:
About my scenario (https://github.com/GoogleContainerTools/jib/issues/2372#issuecomment-606671996): PATCH redirect URL is encoded/decoded
Location:
header in the response).Location:
header and computes the encoded/decoded URL. Location
character by character.Another hypothesis: the server imposes a timeout between PATCH and PUT
I've managed to get network logs, but it's quite big (~700Kb) to paste it here. Do you want to see a specific section from logs or should I share the whole file? If so, what is the preferable way to share a file?
GitHub allows uploading a file by drag-and-dropping into a comment input box. You can do so if the file doesn't have any sensitive info.
Sorry, could you enable debug logging (--debug
) to include timing information? Looking at the log, it does look like a server fault. But I'm trying to rule out the hypothesis that the server imposes some internal timeout for completing a transaction.
Oh, sorry, forget it. I already have timestamps. Let me get back to you real soon.
This proves that your registry is malfunctioning.
-------------- REQUEST --------------
PATCH https://my.reigstry/v2/repo/fork-patcher/blobs/uploads/e6bd15cf-0c3e-40a2-b128-08a568023f6c
...
Content-Type: application/octet-stream
e6bd15cf-0c3e-40a2-b128-08a568023f6c
for this layer upload, as expected. It also continues to return the same upload URL of /v2/repo/fork-patcher/blobs/uploads/e6bd15cf-0c3e-40a2-b128-08a568023f6c
via the Location:
header.
-------------- RESPONSE --------------
HTTP/1.1 202 Accepted
...
Date: Tue, 31 Mar 2020 19:06:48 GMT
...
Docker-Upload-UUID: e6bd15cf-0c3e-40a2-b128-08a568023f6c
Location: /v2/repo/fork-patcher/blobs/uploads/e6bd15cf-0c3e-40a2-b128-08a568023f6c
-------------- REQUEST --------------
PUT https://my.reigstry/v2/repo/fork-patcher/blobs/uploads/e6bd15cf-0c3e-40a2-b128-08a568023f6c?digest=sha256:b0d03fa3137b59f53b5553ffbed0aca17b8427e6f153b99ecc09725c88ab3f03
...
-------------- RESPONSE --------------
HTTP/1.1 404 Not Found
...
Date: Tue, 31 Mar 2020 19:06:49 GMT
...
Total: 161 bytes
{"errors":[{"code":"BLOB_UPLOAD_UNKNOWN","message":"blob upload unknown to registry","detail":"Missing upload with uuid: e6bd15cf-0c3e-40a2-b128-08a568023f6c"}]}
You should contact the registry people and present this evidence that proves that the server is not working correctly.
Thanks for clarification! I'll try to contact them.
If you need assistance, let me know. And please update here once you have more information. That said, what's your registry (Sonatype Nexus, Quay, Harbor, Docker Distribution, etc.)?
It's Sonatype Nexus
Other people did encounter this problem (also with Docker CLI), and there's an open issue on Sonatype Nexus.
[NEXUS-20640] docker push may fail with blob upload unknown due to race condition
As such, I'll close the issue, but feel free to update or re-open as necessary.
Just got the notification that the Sonatype Nexus bug (NEXUS-20640) is marked fixed.
Adding that this seems to affect ghcr.io (Github Container Registry) as well. Adding -Djib.serialize=true
to ./gradlew jib
allowed it to succeed.
Hi guys, does anybody still has this issue?
We are using Nexus OSS 3.38.1-01 (which is not that old, and should contain the fix), but we still encounter this issue.
I also tried to use the -Djib.serialize=true
, which seems to help in the beginning but now it fails with -Djib.serialize=true
also.
we use:
Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537) Java version: 21.0.1, vendor: Amazon.com Inc., runtime: /usr/lib/jvm/java-21-amazon-corretto OS name: "linux", version: "4.14.304-226.531.amzn2.x86_64", arch: "amd64", family: "unix"
<plugin>
<groupId>com.google.cloud.tools</groupId>
<artifactId>jib-maven-plugin</artifactId>
<version>3.3.2</version>
</plugin>
For us, the -Djib.serialize=true
is also not working, we are using jib 3.4.1.
I can also confirm the same issue when uploading packages to GitHub, and in my experience using -Djib.serialize=true
didn't fix it
What I have noticed is that all jib tasks that fail, seems to fail after trying to retry an upload
PUT https://ghcr.io/v2/lorittabot/loritta-morenitta/blobs/upload/82fcebf5-3069-4240-8d7a-98b16d5aadea?digest=sha256:bb08c4be5c13447163f30047ad3bd7d058eae734846b5a68d83133f4ad720448 failed and will be retried
Just something worth nothing that I found out: I think that this is a ghcr.io issue, in my project I have multiple submodules that are uploaded to ghcr.io via jib, and using -Djib.serialize=true
doesn't seem to fix the issue because the jib tasks are still executed in parallel, so what I did for now is to not parallelize the jib tasks by invoking gradlew
separately for each jib task, and that seems to fix the issue, or at the very least, I haven't experienced any tasks yet.
Update to my previous comment: That didn't fix the issue... it does help a lot, but it didn't fully fix the issue.
What I'm currently testing is running the jib tasks in separate steps AND using -Djib.serialize=true
, currently haven't had any issues... yet. Whoops I actually experienced a build fail right after I posted this comment smh. One thing worth noting is that I had other GitHub Actions builds running at the same time, maybe that could be why...?
I have raised an issue to the github support team regarding this issue we experienced with ghcr.io and got a quite thorough response with investigation from their engineering team, I'll paraphrase:
also from my support ticket: "one theory being that Jib may be retrying an upload before the previous upload has actually failed. This could also explain the inconsistency of failure you are seeing, as different uploads would take different amounts of time, and thus some would fit within the retry window (thus not failing), while a few might exceed the retry window and thus fail."
Since this issue is already closed we probably need to open a new one if we want anything further to happen.
Environment:
Description of the issue: Jib fails with BLOB_UPLOAD_UNKNOWN error. I have a gradle project with about 15 submodules and I want to push images in our private registry. When I run jib for all sumbodules in my gradle project, sometimes this error occurs. Jib doesn't fail immediately, it's often able to create and push image for about 12-13 submodules and only then it crashes. If I rerun jib after waiting a few seconds, all the remaining submodules are processed successfully (currently, this is my workaround). I asked people responsible for our registry and they said that this is probably jib's problem, because they have not encountered this error using docker cli. They also suggested to disable parallel execution in gradle, which seemingly made jib more stable, but didn't solve the problem completely.
Expected behavior: Jib runs successfully regardless of number of modules/execution time without BLOB_UPLOAD_UNKNOWN error.
Steps to reproduce:
jib-gradle-plugin
Configuration:Log output:
Additional Information:
I asked people responsible for our registry and they said that this is probably jib's problem, because they have not encountered this error using docker cli. They also suggested to disable parallel execution in gradle, which seemingly made jib more stable, but didn't solve the problem completely.