There's some question about whether or not this is a bug, but it's something for us to keep an eye on going forward. The following is from the cloned bug (https://bugzilla.redhat.com/show_bug.cgi?id=1695236) which is world readable. Most of the comments in 1774279 were "Red Hat Internal", so I've not copied them here to this public board.
Description of problem:
Podman does not request the correct image size when a container is pulled from Quay Enteprise configured with Ceph storage backend and storage proxying enabled. Due to this, the pull fails each time. If proxy storage is disabled, podman correctly pulls the image.
Version-Release number of selected component (if applicable):
Quay Enterprise 2.9.5
Podman version 1.1.2
How reproducible:
always
Steps to Reproduce:
set up Quay Enterprise 2.9.5 with FEATURE_PROXY_STORAGE: true in config.yaml
set up Ceph storage backend in Quay
try and pull any image from Quay Enterprise
Actual results:
Podman fails. This is the debug log from the pull:
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage
DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] overlay test mount with multiple lowers succeeded
DEBU[0000] overlay test mount indicated that metacopy is not being used
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false
INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist
DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.mountopt=nodev]registry.local:80/username/alpine:latest"
Trying to pull docker://registry.local:80/username/alpine:latest...DEBU[0000] Using registries.d directory /etc/containers/registries.d for sigstore configuration
DEBU[0000] Using "default-docker" configuration
DEBU[0000] No signature storage configuration found for registry.local:80/username/alpine:latest
DEBU[0000] Looking for TLS certificates and private keys in /etc/docker/certs.d/registry.local:80
DEBU[0000] Using blob info cache at /var/lib/containers/cache/blob-info-cache-v1.boltdb
DEBU[0000] GET https://registry.local:80/v2/
DEBU[0000] Ping https://registry.local:80/v2/ err Get https://registry.local:80/v2/: http: server gave HTTP response to HTTPS client (&url.Error{Op:"Get", URL:"https://registry.local:80/v2/", Err:(*errors.errorString)(0xc000506020)})
DEBU[0000] GET http://registry.local:80/v2/
DEBU[0000] Ping http://registry.local:80/v2/ status 401
DEBU[0000] GET http://registry.local/v2/auth?account=username&scope=repository%3Ausername%2Falpine%3Apull&service=registry.local
DEBU[0000] Increasing token expiration to: 60 seconds
DEBU[0000] GET http://registry.local:80/v2/username/alpine/manifests/latest
DEBU[0002] IsRunningImageAllowed for image docker:registry.local:80/username/alpine:latest
DEBU[0002] Using default policy section
DEBU[0002] Requirement 0: allowed
DEBU[0002] Overall: allowed
Getting image source signatures
DEBU[0002] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]
DEBU[0002] ... will first try using the original manifest unmodified
DEBU[0002] Downloading /v2/username/alpine/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
DEBU[0002] GET http://registry.local:80/v2/username/alpine/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
DEBU[0002] Downloading /v2/username/alpine/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c
DEBU[0002] GET http://registry.local:80/v2/username/alpine/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c
Copying blob 8e402f1a9c57 [======================================] 1.4GiB / 1.4GiB
Copying blob a3ed95caeb02 [======================================] 1.4GiB / 1.4GiB
DEBU[0003] Error pulling image ref //registry.local:80/username/alpine:latest: Error reading blob sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c: Invalid status code returned when fetching blob 400 (Bad Request)
Failed
(0x1840f00,0xc000246c60)
Expected results:
Podman successfully pulls the image from storage.
Additional info:
The reason Ceph returns the 400 is because layer size is not correct. For some reason, podman assumes a size of 1.4 GB which is incorrect, since the whole alpine image has roughly 5 MB. Additionally, when the direct URL to the image layer is reverse engineered from Quay's log, curling the URL works and the image can be pulled. The pull also works with Docker CLI, regardless of whether storage is proxied or not.
Ivan Bazulic 2019-04-02 17:07:29 UTC
Hardware: Unspecified → x86_64
OS: Unspecified → Linux
Severity: unspecified → low
Tom McKay 2019-04-03 12:28:09 UTC
CC: tomckay@redhat.com
Daniel Walsh 2019-04-03 14:06:10 UTC
Assignee: lsm5@redhat.com → vrothber@redhat.com
Daniel Walsh 2019-04-03 14:06:38 UTC
Status: NEW → ASSIGNED
CC: mitr@redhat.com
Private
Comment 1 Miloslav Trmač 2019-04-03 15:06:47 UTC
What do you mean by “requests storage size”?
This sounds unlikely; the GET …/blobs/… is exactly what the code asks for, there is no Range header or anything else specifying any particular size.
Copying blob a3ed95caeb02 [======================================] 1.4GiB / 1.4GiB
…
For some reason, podman assumes a size of 1.4 GB which is incorrect, since the whole alpine image has roughly 5 MB.
This is only a bug in the progress bar; schema1 manifests don’t include a blob size at all (which is another reason_ why podman can’t be requesting an incorrect one), which causes the progress bar to be created with size -1, and for some reason the progress bar package uses the current timestamp for size — and that works out to about 1.4 GiB nowadays.
The reason Ceph returns the 400 is because layer size is not correct.
Have you actually verified this this on the Ceph side? If so, where is Ceph reading the size from?
Flags: needinfo?(ibazulic@redhat.com)
Private
Comment 2 Ivan Bazulic 2019-04-08 06:41:24 UTC
Created attachment 1553440 [details]
Docker pull logs from Quay
Flags: needinfo?(ibazulic@redhat.com)
Private
Comment 3 Ivan Bazulic 2019-04-08 06:41:56 UTC
Created attachment 1553441 [details]
Podman pull logs from Quay
Private
Comment 4 Ivan Bazulic 2019-04-08 06:49:23 UTC
(In reply to Miloslav Trmač from comment #1)
What do you mean by “requests storage size”?
Where did I mention this?
This sounds unlikely; the GET …/blobs/… is exactly what the code asks for,
there is no Range header or anything else specifying any particular size.
Fair enough.
The reason Ceph returns the 400 is because layer size is not correct.
Have you actually verified this this on the Ceph side? If so, where is Ceph
reading the size from?
Yes. This is a log from the Ceph side when 'docker pull' is invoked:
It's 400. I have also provided Quay log files from both podman and docker pulls, with Quay debugging on. You'll notice that on both pulls JWT tokens created by Quay are the same, when decoded with jwt.io. I have extensively talked with Joey Schorr about this, since I thought that this was a bug in Quay first (and have opened a JIRA for it). However, after carefully examining the requests received and sent by Quay to Ceph, we believe that the issue is in podman. What's surprising is that the same issues does not happen when podman is pushing to Quay, only pulls are problematic.
If there's anything else I can provide, let me know.
Private
Comment 5 Ivan Bazulic 2019-04-08 10:39:06 UTC
What's even more interesting is the fact that Podman requests a totally wrong blob when proxy storage is enabled. I removed the proxy storage from the config file and now Podman pulls the correct blob:
Private
Comment 6 Miloslav Trmač 2019-04-08 13:39:10 UTC
(In reply to Miloslav Trmač from comment #1)
What do you mean by “requests storage size”?
Where did I mention this?
I’m sorry, the actual quote was “does not request the correct image size”. (That is even less applicable because images are not downloaded as whole units in the docker/distribution HTTP API.)
Have you actually verified this this on the Ceph side? If so, where is Ceph
reading the size from?
Yes. This is a log from the Ceph side when 'docker pull' is invoked:
Nothing in there involves any kind of size determination / check / recorded value AFAICT; all it says is a 400, from some /my-new-bucket/ URL that is internal to the Quay implementation and not directly specified by a docker/distribution client.
(In reply to Ivan Bazulic from comment #5)
What's even more interesting is the fact that Podman requests a totally
wrong blob when proxy storage is enabled. I removed the proxy storage from
the config file and now Podman pulls the correct blob:
Ah, I think I know a bit about what is going on.
Guessing that the image corresponds with current docker.io/library/alpine:latest, i.e. docker://alpine@sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0 in skopeo syntax. That image, in the schema2 format, has a single layer blob, 8e4… , and a config blob, 5cb….
The image pulled from your registry is not exactly that original one; it is a schema1 image:
Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws
The schema2 image has 2 entries in its config.json history field; in schema1, that is represented as two blobs (entries in “fsLayers” and “history”), one of them an empty gzipped tar file. Conventionally, a specific representation with digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 is used for that empty one. That blob is an actual artifact of the schema1 image, and the registry must, AFAIK, be able to serve it just like any other blob (correspondingly, schema1 pushes do actually push that blob to registries).
Another aspect of that is that docker has an optimization not to pull these empty blobs, and podman does not - it pulls them and discards them later, which causes them to be pulled again every time. So, yes, there is a behavior difference between (recent) docker versions and podman in pulling a3ed95…, which causes this to manifest with podman. Still, the a3ed95… layer is an integral part of the image, explicitly listed in the manifest, so it should actually be served by the registry AFAICT (notably nothing makes throwaway layers special in https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md ).
So, something is processing the a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 empty tar file incorrectly; I’d guess that the storage layer is treating it specially (maybe not recording it in the backing storage at all, and hard-coding the contents in the serving application?), and that special treatment is somehow missing/broken/different for the storage proxy. (Podman is completely unaware of pull-through/storage proxies; that happens server-side.)
Private
Comment 7 Daniel Walsh 2019-04-08 14:04:19 UTC
Mitr, should we add this optimization to Podman?
`
Another aspect of that is that docker has an optimization not to pull these empty blobs, and podman does not - it pulls them and discards them later, which causes them to be pulled again every time. So, yes, there is a behavior difference between (recent) docker versions and podman in pulling a3ed95…, which causes this to manifest with podman. Still, the a3ed95… layer is an integral part of the image, explicitly listed in the manifest, so it should actually be served by the registry AFAICT (notably nothing makes throwaway layers special in https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md ).
`
Private
Comment 8 Miloslav Trmač 2019-04-08 14:10:20 UTC
(In reply to Daniel Walsh from comment #7)
Mitr, should we add this optimization to Podman?
Another aspect of that is that docker has an optimization not to pull these
empty blobs, and podman does not…
Sure, in principle, but with pretty low priority: this affects only schema1, which has been going away for a long time (notably no multi-arch support), and saves about 1 HTTP round-trip per schema1 image. The storageImageDestination.TryReusingBlob code path is already getting pretty complex (with the recent concurrent copy detection and the like), this empty layer detection should only be added if it does not interfere with those more important features (and, while unlikely, if it does not affect readability significantly).
And if Quay is handling such layers incorrectly, that should be fixed either way.
Private
Comment 9 jschorr@redhat.com 2019-04-08 17:35:00 UTC
So, something is processing the a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 empty tar file incorrectly; I’d guess that the storage layer is treating it specially (maybe not recording it in the backing storage at all, and hard-coding the contents in the serving application?), and that special treatment is somehow missing/broken/different for the storage proxy. (Podman is completely unaware of pull-through/storage proxies; that happens server-side.)
Actually, no. Quay explicitly ensures that the special layer is stored into storage when a schema1 manifest is referenced in order to ensure that it can be pulled later on. Even if the layer was missing, that should cause a 404, not a 400, and it would then definitely fail when storage proxying is turned off and podman is used, since it wouldn't be present.
I think something more is going on here, but I'm unsure what it can be; storage proxying only determines from where the client pulls the blobs; it still serves the same exact blobs from storage.
CC: jschorr@redhat.com
Private
Comment 10 Miloslav Trmač 2019-04-08 17:57:59 UTC
I mean, somewhere within Quay or its backend services must be a “if $condition { fail with 400 }”; it should always be possible to trace this down down to the actual cause.
Failing that, extending podman with https://github.com/mtrmac/image/tree/http-dump, or an equivalent server-side full HTTP capture might help, because Podman’s docker/distribution API client, on this particular code path, only reports the HTTP status code and discards the response body — but that’s only going to help if the server actually reports the underlying cause in there. Is there a server I can easily access to test that? Or a ready-to-go VM image, or something like that? I’ve never deployed Quay nor Ceph, so setting this up myself would be very time-consuming.
Private
Comment 11 jschorr@redhat.com 2019-04-08 17:59:26 UTC
We can provide a container image for running Quay; Ivan B (who reported the issue) can help with getting it setup with this flag enabled.
That being said, the 400 is coming from nginx fronting the call from Ceph, so there has to be some odd interaction going on there.
Private
Comment 12 Ivan Bazulic 2019-04-10 12:53:39 UTC
I'm in the process of creating a Quay Ceph cluster on AWS as PoC. I'll update you when I'm done and will send you the info needed to log on.
Private
Comment 13 Miloslav Trmač 2019-04-16 19:16:58 UTC
PnT Account Manager 2019-05-01 21:58:00 UTC
CC: fkluknav@redhat.com
Private
Comment 14 Pavel Vitous 2019-05-09 09:28:50 UTC
Hello,
sorry to hijack the thread, but I've experienced something very similar to this. I'll try to be brief.
We have a setup with Quay Enterprise and as a storage we're using S3 backend (temporarily we use Zenko Cloudserver, but same behavior was tested with minio) with proxy mode.
I'll make a remark that with Docker client there aren't any issues. Push/Pull all is working great.
However with Podman/Skopeo(libpod) we run to the problem when we've tried pulling images from the Quay. I was getting 401 responses, while without Proxy mode everything was good.
I spent few days digging thru tcpdump captures.
Until I noticed that with libpod there was a little bit different behavior to the Docker client.
After the client receives HTTP 302 for the storage request, then the subsequent GET from Docker client on the redir address doesn't contain HTTP header "Authorization: Bearer.." and the request is successful.
Meanwhile Libpod keeps the Auth header from the past request. This may sound legit, but then the S3 can't handle the request and response with 401. I couldn't test it against AWS S3, not sure if behavior is the same. We plan to use HCP S3 as a backend.
If you'd like I can send you the pcap captures for investigation.
Simple workaround for our case was to strip Auth headers in nginx at proxy_storage location.
BR,
Pavel
CC: vitous.pavel@ifortuna.cz
Private
Comment 15 Miloslav Trmač 2019-05-09 13:58:42 UTC
Pavel, thanks, that does sound interesting, you are right (Docker uses a custom http.RoundTripper to add credentials specific to a given host, for each request individually, while c/image specifies the credentials in the original request, so the credentials are used within redirects as well).
Still, comment#13 shows a curious behavior difference already when getting the HTTP 302 redirect; the code does not even get to the point of using resulting URL. It’s always possible that I made a mistake when collecting that data.
Private
Comment 16 Pavel Vitous 2019-05-10 07:28:55 UTC
Hi Miloslav,
I forgot to mention one important fact, what I've described only happens if you're making connection thru loadbalancer (tested with HaProxy and Nginx in tcpmode). When you go directly to Quay endpoint (skipping the loadbalancer), then it works like a charm.
So connection goes like Client (pull) ----> LB(HaProxy) ----> Quay --(proxy_mode)--> S3.
Btw. I just made a test with CEPH as storage backend and as far I can tell, stripping the Auth header probably hotfixes the issue as well.
This is what I get when I try pull without Auth header strip LB in front of quay:
DEBU[0004] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]
DEBU[0004] ... will first try using the original manifest unmodified
DEBU[0004] Downloading /v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b
DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b
DEBU[0004] Downloading /v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8
DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8
ERRO[0004] Error pulling image ref //lb-quay-registry.archer/test/minio:latest: Error reading blob sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8: Invalid status code returned when fetching blob 400 (Bad Request)
Failed
And here the same situation but with a workaround:
DEBU[0004] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]
DEBU[0004] ... will first try using the original manifest unmodified
DEBU[0004] Downloading /v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b
DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b
DEBU[0004] Downloading /v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8
DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8
DEBU[0005] Detected compression format gzip
DEBU[0005] Using original blob without modification
Copying blob 597d1508cd98 [======================================] 1.5GiB / 1.5GiB
DEBU[0005] Detected compression format gzip
Without LB it works in both situations.
Pavel
Private
Comment 17 Valentin Rothberg 2019-07-18 08:27:46 UTC
I can’t really think of any way to diagnose the cause of the 400 status, given that two people have observed different outputs with the same input. My best guess is that this needs to be diagnosed by tracing inside the Quay server. (A distinctly second guess is that this is actually caused by comment#14 , and we have both not noticed the difference.)
I guess, marking NEEDINFO for diagnosing the cause server-side.
Flags: needinfo?(ibazulic@redhat.com)
Private
Comment 22 Tom McKay 2019-07-22 17:24:31 UTC
I will take this on to test quay w/ both podman and skopeo. Skopeo will be used for repository mirroring and so confirming this works is a high priority.
Private
Comment 23 Daniel Walsh 2019-08-14 10:35:29 UTC
Tom I believe I can still mark this as fixed in podman 1.5 correct?
Private
Comment 24 Ivan Bazulic 2019-10-23 12:27:18 UTC
It is not fixed. Podman version 1.6 is still inhabiting the same problem:
This is with FEATURE_PROXY_STORAGE: true. The same does not happen when the storage proxy feature is turned off. In that case, podman pulls work as expected.
Same error happens when FEATURE_PROXY_STORAGE: true and the supporting storage backend is AWS, so it's not connected to Ceph directly. Again, the same does not happen when Docker is used. Proxying storage is only used with pulls, pushes work as intended.
Flags: needinfo?(ibazulic@redhat.com)
Private
Comment 25 Tom McKay 2019-10-29 14:54:34 UTC
It would be good to resolve this issue as it still fails on podman-1.6.2.
From our debug it appears as if podman is resending the wrong headers. We can provide a quay setup to work with. Docker client works successfully as expected.
Flags: needinfo?(mitr@redhat.com)
Private
Comment 26 Daniel Walsh 2019-10-29 15:12:11 UTC
Can someone try modifying podman to turn off the "resend the Auth header" behavior and see if it is fixes the problem?
Derrick Ornelas 2019-10-30 20:25:25 UTC
CC: dornelas@redhat.com
Private
Comment 28 Ben Cotton 2019-10-31 18:42:50 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 29 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Ivan Bazulic 2019-11-08 17:15:06 UTC
Version: 29 → 31
Derrick Ornelas 2019-11-19 23:38:59 UTC
Blocks: Red Hat1774279
Private
Comment 29 svelosol@redhat.com 2019-12-04 08:52:33 UTC
This bug makes impossible to use Openshift 4 with Quay in proxy mode as cri-o uses same library than Podman.
The workaround proposed by @Pavel works fine but it should not be handled by Quay nginx.
This change has already been merged in the Quay code for the next release, as it was becoming an issue for our customers. You are correct, though, that should ideally be handled on the podman side.
Copying the Description from: https://bugzilla.redhat.com/show_bug.cgi?id=1774279
There's some question about whether or not this is a bug, but it's something for us to keep an eye on going forward. The following is from the cloned bug (https://bugzilla.redhat.com/show_bug.cgi?id=1695236) which is world readable. Most of the comments in 1774279 were "Red Hat Internal", so I've not copied them here to this public board.
Description of problem: Podman does not request the correct image size when a container is pulled from Quay Enteprise configured with Ceph storage backend and storage proxying enabled. Due to this, the pull fails each time. If proxy storage is disabled, podman correctly pulls the image.
Version-Release number of selected component (if applicable): Quay Enterprise 2.9.5 Podman version 1.1.2
How reproducible: always
Steps to Reproduce:
Actual results: Podman fails. This is the debug log from the pull:
podman pull --creds "username:password" docker://registry.local:80/username/alpine:latest --log-level debug
DEBU[0000] Initializing boltdb state at /var/lib/containers/storage/libpod/bolt_state.db DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /var/lib/containers/storage DEBU[0000] Using run root /var/run/containers/storage
DEBU[0000] Using static dir /var/lib/containers/storage/libpod DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /var/lib/containers/storage/volumes DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay" DEBU[0000] overlay test mount with multiple lowers succeeded DEBU[0000] overlay test mount indicated that metacopy is not being used DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false INFO[0000] Found CNI network podman (type=bridge) at /etc/cni/net.d/87-podman-bridge.conflist DEBU[0000] parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage:overlay.mountopt=nodev]registry.local:80/username/alpine:latest" Trying to pull docker://registry.local:80/username/alpine:latest...DEBU[0000] Using registries.d directory /etc/containers/registries.d for sigstore configuration DEBU[0000] Using "default-docker" configuration
DEBU[0000] No signature storage configuration found for registry.local:80/username/alpine:latest DEBU[0000] Looking for TLS certificates and private keys in /etc/docker/certs.d/registry.local:80 DEBU[0000] Using blob info cache at /var/lib/containers/cache/blob-info-cache-v1.boltdb DEBU[0000] GET https://registry.local:80/v2/
DEBU[0000] Ping https://registry.local:80/v2/ err Get https://registry.local:80/v2/: http: server gave HTTP response to HTTPS client (&url.Error{Op:"Get", URL:"https://registry.local:80/v2/", Err:(*errors.errorString)(0xc000506020)}) DEBU[0000] GET http://registry.local:80/v2/
DEBU[0000] Ping http://registry.local:80/v2/ status 401 DEBU[0000] GET http://registry.local/v2/auth?account=username&scope=repository%3Ausername%2Falpine%3Apull&service=registry.local DEBU[0000] Increasing token expiration to: 60 seconds
DEBU[0000] GET http://registry.local:80/v2/username/alpine/manifests/latest DEBU[0002] IsRunningImageAllowed for image docker:registry.local:80/username/alpine:latest DEBU[0002] Using default policy section
DEBU[0002] Requirement 0: allowed
DEBU[0002] Overall: allowed
Getting image source signatures DEBU[0002] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json] DEBU[0002] ... will first try using the original manifest unmodified DEBU[0002] Downloading /v2/username/alpine/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 DEBU[0002] GET http://registry.local:80/v2/username/alpine/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 DEBU[0002] Downloading /v2/username/alpine/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c DEBU[0002] GET http://registry.local:80/v2/username/alpine/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c Copying blob 8e402f1a9c57 [======================================] 1.4GiB / 1.4GiB Copying blob a3ed95caeb02 [======================================] 1.4GiB / 1.4GiB DEBU[0003] Error pulling image ref //registry.local:80/username/alpine:latest: Error reading blob sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c: Invalid status code returned when fetching blob 400 (Bad Request) Failed (0x1840f00,0xc000246c60)
Expected results: Podman successfully pulls the image from storage.
Additional info: The reason Ceph returns the 400 is because layer size is not correct. For some reason, podman assumes a size of 1.4 GB which is incorrect, since the whole alpine image has roughly 5 MB. Additionally, when the direct URL to the image layer is reverse engineered from Quay's log, curling the URL works and the image can be pulled. The pull also works with Docker CLI, regardless of whether storage is proxied or not.
Ivan Bazulic 2019-04-02 17:07:29 UTC Hardware: Unspecified → x86_64 OS: Unspecified → Linux Severity: unspecified → low Tom McKay 2019-04-03 12:28:09 UTC CC: tomckay@redhat.com Daniel Walsh 2019-04-03 14:06:10 UTC Assignee: lsm5@redhat.com → vrothber@redhat.com Daniel Walsh 2019-04-03 14:06:38 UTC Status: NEW → ASSIGNED CC: mitr@redhat.com Private Comment 1 Miloslav Trmač 2019-04-03 15:06:47 UTC
What do you mean by “requests storage size”?
This sounds unlikely; the GET …/blobs/… is exactly what the code asks for, there is no Range header or anything else specifying any particular size.
This is only a bug in the progress bar; schema1 manifests don’t include a blob size at all (which is another reason_ why podman can’t be requesting an incorrect one), which causes the progress bar to be created with size -1, and for some reason the progress bar package uses the current timestamp for size — and that works out to about 1.4 GiB nowadays.
Have you actually verified this this on the Ceph side? If so, where is Ceph reading the size from?
Flags: needinfo?(ibazulic@redhat.com) Private Comment 2 Ivan Bazulic 2019-04-08 06:41:24 UTC
Created attachment 1553440 [details] Docker pull logs from Quay
Flags: needinfo?(ibazulic@redhat.com) Private Comment 3 Ivan Bazulic 2019-04-08 06:41:56 UTC
Created attachment 1553441 [details] Podman pull logs from Quay
Private Comment 4 Ivan Bazulic 2019-04-08 06:49:23 UTC
(In reply to Miloslav Trmač from comment #1)
Where did I mention this?
Fair enough.
Yes. This is a log from the Ceph side when 'docker pull' is invoked:
2019-04-08 02:33:43.444385 7ff405da7700 1 ====== starting new request req=0x7ff405da1650 ===== 2019-04-08 02:33:43.476985 7ff405da7700 1 ====== req done req=0x7ff405da1650 op status=0 http_status=200 ====== 2019-04-08 02:33:43.477052 7ff405da7700 1 civetweb: 0x557a4feb0000: 192.168.58.3 - - [08/Apr/2019:02:33:43 -0400] "GET /my-new-bucket/datastorage/registry/sha256/8e/8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c HTTP/1.0" 200 0 http://ubuntu.tardis.local:80/v2/ibazulic/alpine.latest/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c docker/18.09.4 go/go1.10.8 git-commit/d14af54 kernel/5.0.5-200.fc29.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.4 (linux))
As you can see, the return is 200. Now, take a look at podman request:
2019-04-08 02:37:06.934968 7ff402da1700 1 ====== starting new request req=0x7ff402d9b650 ===== 2019-04-08 02:37:06.935218 7ff402da1700 1 ====== req done req=0x7ff402d9b650 op status=0 http_status=400 ====== 2019-04-08 02:37:06.935256 7ff402da1700 1 civetweb: 0x557a4feba000: 192.168.58.3 - - [08/Apr/2019:02:37:06 -0400] "GET /my-new-bucket/datastorage/registry/sha256/a3/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 HTTP/1.0" 400 0 http://ubuntu.tardis.local:80/v2/ibazulic/alpine.latest/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 Go-http-client/1.1
It's 400. I have also provided Quay log files from both podman and docker pulls, with Quay debugging on. You'll notice that on both pulls JWT tokens created by Quay are the same, when decoded with jwt.io. I have extensively talked with Joey Schorr about this, since I thought that this was a bug in Quay first (and have opened a JIRA for it). However, after carefully examining the requests received and sent by Quay to Ceph, we believe that the issue is in podman. What's surprising is that the same issues does not happen when podman is pushing to Quay, only pulls are problematic.
If there's anything else I can provide, let me know.
Private Comment 5 Ivan Bazulic 2019-04-08 10:39:06 UTC
What's even more interesting is the fact that Podman requests a totally wrong blob when proxy storage is enabled. I removed the proxy storage from the config file and now Podman pulls the correct blob:
Apr 8 10:35:37 23b6e6933843 gunicorn_registry[1844]: 2019-04-08 10:35:37,332 [6258] [INFO] [gunicorn.access] 192.168.56.1 - - [08/Apr/2019:10:35:37] "GET /v2/ibazulic/alpine.latest/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c HTTP/1.1" 302 671 "-" "Go-http-client/1.1" Apr 8 10:35:37 23b6e6933843 nginx[1893]: 192.168.56.1 () - - [08/Apr/2019:10:35:37 +0000] "GET /v2/ibazulic/alpine.latest/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c HTTP/1.1" 302 671 "-" "Go-http-client/1.1" (0.063 1338 0.060)
Also, Ceph reports a 200 on every request as well:
2019-04-08 05:50:58.804772 7ff40ddb7700 1 ====== starting new request req=0x7ff40ddb1650 ===== 2019-04-08 05:50:58.879994 7ff40ddb7700 1 ====== req done req=0x7ff40ddb1650 op status=0 http_status=200 ====== 2019-04-08 05:50:58.880081 7ff40ddb7700 1 civetweb: 0x557a4fed3000: 192.168.58.1 - - [08/Apr/2019:05:50:58 -0400] "GET /my-new-bucket/datastorage/registry/sha256/8e/8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c HTTP/1.1" 200 0 http://ubuntu.tardis.local:80/v2/ibazulic/alpine.latest/blobs/sha256:8e402f1a9c577ded051c1ef10e9fe4492890459522089959988a4852dee8ab2c Go-http-client/1.1
Private Comment 6 Miloslav Trmač 2019-04-08 13:39:10 UTC
I’m sorry, the actual quote was “does not request the correct image size”. (That is even less applicable because images are not downloaded as whole units in the docker/distribution HTTP API.)
Nothing in there involves any kind of size determination / check / recorded value AFAICT; all it says is a 400, from some /my-new-bucket/ URL that is internal to the Quay implementation and not directly specified by a docker/distribution client.
(In reply to Ivan Bazulic from comment #5)
Ah, I think I know a bit about what is going on.
Guessing that the image corresponds with current docker.io/library/alpine:latest, i.e. docker://alpine@sha256:644fcb1a676b5165371437feaa922943aaf7afcfa8bfee4472f6860aad1ef2a0 in skopeo syntax. That image, in the schema2 format, has a single layer blob, 8e4… , and a config blob, 5cb….
The image pulled from your registry is not exactly that original one; it is a schema1 image:
The schema2 image has 2 entries in its config.json history field; in schema1, that is represented as two blobs (entries in “fsLayers” and “history”), one of them an empty gzipped tar file. Conventionally, a specific representation with digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 is used for that empty one. That blob is an actual artifact of the schema1 image, and the registry must, AFAIK, be able to serve it just like any other blob (correspondingly, schema1 pushes do actually push that blob to registries).
Another aspect of that is that docker has an optimization not to pull these empty blobs, and podman does not - it pulls them and discards them later, which causes them to be pulled again every time. So, yes, there is a behavior difference between (recent) docker versions and podman in pulling a3ed95…, which causes this to manifest with podman. Still, the a3ed95… layer is an integral part of the image, explicitly listed in the manifest, so it should actually be served by the registry AFAICT (notably nothing makes throwaway layers special in https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md ).
So, something is processing the a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4 empty tar file incorrectly; I’d guess that the storage layer is treating it specially (maybe not recording it in the backing storage at all, and hard-coding the contents in the serving application?), and that special treatment is somehow missing/broken/different for the storage proxy. (Podman is completely unaware of pull-through/storage proxies; that happens server-side.)
Private Comment 7 Daniel Walsh 2019-04-08 14:04:19 UTC
Mitr, should we add this optimization to Podman?
`
Another aspect of that is that docker has an optimization not to pull these empty blobs, and podman does not - it pulls them and discards them later, which causes them to be pulled again every time. So, yes, there is a behavior difference between (recent) docker versions and podman in pulling a3ed95…, which causes this to manifest with podman. Still, the a3ed95… layer is an integral part of the image, explicitly listed in the manifest, so it should actually be served by the registry AFAICT (notably nothing makes throwaway layers special in https://github.com/docker/distribution/blob/master/docs/spec/manifest-v2-1.md ). `
Private Comment 8 Miloslav Trmač 2019-04-08 14:10:20 UTC
(In reply to Daniel Walsh from comment #7)
Sure, in principle, but with pretty low priority: this affects only schema1, which has been going away for a long time (notably no multi-arch support), and saves about 1 HTTP round-trip per schema1 image. The storageImageDestination.TryReusingBlob code path is already getting pretty complex (with the recent concurrent copy detection and the like), this empty layer detection should only be added if it does not interfere with those more important features (and, while unlikely, if it does not affect readability significantly).
And if Quay is handling such layers incorrectly, that should be fixed either way.
Private Comment 9 jschorr@redhat.com 2019-04-08 17:35:00 UTC
Actually, no. Quay explicitly ensures that the special layer is stored into storage when a schema1 manifest is referenced in order to ensure that it can be pulled later on. Even if the layer was missing, that should cause a 404, not a 400, and it would then definitely fail when storage proxying is turned off and podman is used, since it wouldn't be present.
I think something more is going on here, but I'm unsure what it can be; storage proxying only determines from where the client pulls the blobs; it still serves the same exact blobs from storage.
CC: jschorr@redhat.com Private Comment 10 Miloslav Trmač 2019-04-08 17:57:59 UTC
I mean, somewhere within Quay or its backend services must be a “if $condition { fail with 400 }”; it should always be possible to trace this down down to the actual cause.
Failing that, extending podman with https://github.com/mtrmac/image/tree/http-dump, or an equivalent server-side full HTTP capture might help, because Podman’s docker/distribution API client, on this particular code path, only reports the HTTP status code and discards the response body — but that’s only going to help if the server actually reports the underlying cause in there. Is there a server I can easily access to test that? Or a ready-to-go VM image, or something like that? I’ve never deployed Quay nor Ceph, so setting this up myself would be very time-consuming.
Private Comment 11 jschorr@redhat.com 2019-04-08 17:59:26 UTC
We can provide a container image for running Quay; Ivan B (who reported the issue) can help with getting it setup with this flag enabled.
That being said, the 400 is coming from nginx fronting the call from Ceph, so there has to be some odd interaction going on there.
Private Comment 12 Ivan Bazulic 2019-04-10 12:53:39 UTC
I'm in the process of creating a Quay Ceph cluster on AWS as PoC. I'll update you when I'm done and will send you the info needed to log on.
Private Comment 13 Miloslav Trmač 2019-04-16 19:16:58 UTC
For the record: reproduced, skopeo fails with
Yet Docker succeeds:
That’s already curious, but changing skopeo to use exactly the same headers still fails:
PnT Account Manager 2019-05-01 21:58:00 UTC CC: fkluknav@redhat.com Private Comment 14 Pavel Vitous 2019-05-09 09:28:50 UTC
Hello,
sorry to hijack the thread, but I've experienced something very similar to this. I'll try to be brief. We have a setup with Quay Enterprise and as a storage we're using S3 backend (temporarily we use Zenko Cloudserver, but same behavior was tested with minio) with proxy mode.
I'll make a remark that with Docker client there aren't any issues. Push/Pull all is working great.
However with Podman/Skopeo(libpod) we run to the problem when we've tried pulling images from the Quay. I was getting 401 responses, while without Proxy mode everything was good. I spent few days digging thru tcpdump captures. Until I noticed that with libpod there was a little bit different behavior to the Docker client. After the client receives HTTP 302 for the storage request, then the subsequent GET from Docker client on the redir address doesn't contain HTTP header "Authorization: Bearer.." and the request is successful. Meanwhile Libpod keeps the Auth header from the past request. This may sound legit, but then the S3 can't handle the request and response with 401. I couldn't test it against AWS S3, not sure if behavior is the same. We plan to use HCP S3 as a backend. If you'd like I can send you the pcap captures for investigation. Simple workaround for our case was to strip Auth headers in nginx at proxy_storage location.
BR, Pavel
CC: vitous.pavel@ifortuna.cz Private Comment 15 Miloslav Trmač 2019-05-09 13:58:42 UTC
Pavel, thanks, that does sound interesting, you are right (Docker uses a custom http.RoundTripper to add credentials specific to a given host, for each request individually, while c/image specifies the credentials in the original request, so the credentials are used within redirects as well).
Still, comment#13 shows a curious behavior difference already when getting the HTTP 302 redirect; the code does not even get to the point of using resulting URL. It’s always possible that I made a mistake when collecting that data.
Private Comment 16 Pavel Vitous 2019-05-10 07:28:55 UTC
Hi Miloslav,
I forgot to mention one important fact, what I've described only happens if you're making connection thru loadbalancer (tested with HaProxy and Nginx in tcpmode). When you go directly to Quay endpoint (skipping the loadbalancer), then it works like a charm.
So connection goes like Client (pull) ----> LB(HaProxy) ----> Quay --(proxy_mode)--> S3.
Btw. I just made a test with CEPH as storage backend and as far I can tell, stripping the Auth header probably hotfixes the issue as well.
This is what I get when I try pull without Auth header strip LB in front of quay: DEBU[0004] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json] DEBU[0004] ... will first try using the original manifest unmodified DEBU[0004] Downloading /v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b DEBU[0004] Downloading /v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8 DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8 ERRO[0004] Error pulling image ref //lb-quay-registry.archer/test/minio:latest: Error reading blob sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8: Invalid status code returned when fetching blob 400 (Bad Request) Failed
And here the same situation but with a workaround: DEBU[0004] Manifest has MIME type application/vnd.docker.distribution.manifest.v1+prettyjws, ordered candidate list [application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.docker.distribution.manifest.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json] DEBU[0004] ... will first try using the original manifest unmodified DEBU[0004] Downloading /v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:597d1508cd98cc549ab2200e992422a4d2a08eccf870eed959e48be82c1a016b DEBU[0004] Downloading /v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8 DEBU[0004] GET http://lb-quay-registry.archer/v2/test/minio/blobs/sha256:b19d51ada28e91e7a5395b8ac310b927279dd3ba3d19656ec7e75d78107df1f8 DEBU[0005] Detected compression format gzip
DEBU[0005] Using original blob without modification
Copying blob 597d1508cd98 [======================================] 1.5GiB / 1.5GiB DEBU[0005] Detected compression format gzip
Without LB it works in both situations.
Pavel
Private Comment 17 Valentin Rothberg 2019-07-18 08:27:46 UTC
Note that the initial issue (i.e., displaying of incorrect size) in this bug has been fixed upstream: https://github.com/containers/image/commit/2178abd5f9b1b9e70c0aca05780d1defba350a53
It will soon be vendored into Podman and be released.
If you want the other cross-posted issues addressed, please open a dedicated bug.
Private Comment 18 Daniel Walsh 2019-07-18 10:25:48 UTC
Ok I will mark this as fixed in podman-1.5
Fixed In Version: podman-1.5 Private Comment 19 Miloslav Trmač 2019-07-18 15:05:17 UTC
The primary issue is a 400 status returned, for a completely unknown reason. That has not been diagnosed to date.
(FWIW, the bearer authorization behavior difference reported in comment#14 is now tracked upstream in https://github.com/containers/image/issues/641 .)
Private Comment 20 Valentin Rothberg 2019-07-19 07:46:21 UTC
@Miloslav, good point. Would you mind looking into that? I have no expertise on the protocol level.
Assignee: vrothber@redhat.com → mitr@redhat.com Private Comment 21 Miloslav Trmač 2019-07-19 21:05:08 UTC
I can’t really think of any way to diagnose the cause of the 400 status, given that two people have observed different outputs with the same input. My best guess is that this needs to be diagnosed by tracing inside the Quay server. (A distinctly second guess is that this is actually caused by comment#14 , and we have both not noticed the difference.)
I guess, marking NEEDINFO for diagnosing the cause server-side.
Flags: needinfo?(ibazulic@redhat.com) Private Comment 22 Tom McKay 2019-07-22 17:24:31 UTC
I will take this on to test quay w/ both podman and skopeo. Skopeo will be used for repository mirroring and so confirming this works is a high priority.
Private Comment 23 Daniel Walsh 2019-08-14 10:35:29 UTC
Tom I believe I can still mark this as fixed in podman 1.5 correct?
Private Comment 24 Ivan Bazulic 2019-10-23 12:27:18 UTC
It is not fixed. Podman version 1.6 is still inhabiting the same problem:
root@fedora:~# podman --log-level=debug pull --creds ibazulic:Budala123$ docker://debian.tardis.local/ibazulic/alpine:latest ... DEBU[0000] GET https://debian.tardis.local/v2/
DEBU[0000] Ping https://debian.tardis.local/v2/ err Get https://debian.tardis.local/v2/: dial tcp 192.168.20.100:443: connect: connection refused (&url.Error{Op:"Get", URL:"https://debian.tardis.local/v2/", Err:(*net.OpError)(0xc0003da0a0)}) DEBU[0000] GET http://debian.tardis.local/v2/
DEBU[0000] Ping http://debian.tardis.local/v2/ status 401 DEBU[0000] GET http://debian.tardis.local/v2/auth?account=ibazulic&scope=repository%3Aibazulic%2Falpine%3Apull&service=debian.tardis.local DEBU[0000] Increasing token expiration to: 60 seconds
DEBU[0000] GET http://debian.tardis.local/v2/ibazulic/alpine/manifests/latest DEBU[0000] Using blob info cache at /var/lib/containers/cache/blob-info-cache-v1.boltdb DEBU[0000] IsRunningImageAllowed for image docker:debian.tardis.local/ibazulic/alpine:latest DEBU[0000] Using default policy section
DEBU[0000] Requirement 0: allowed
DEBU[0000] Overall: allowed
DEBU[0000] Downloading /v2/ibazulic/alpine/blobs/sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 DEBU[0000] GET http://debian.tardis.local/v2/ibazulic/alpine/blobs/sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 DEBU[0000] Error pulling image ref //debian.tardis.local/ibazulic/alpine:latest: Error parsing image configuration: Invalid status code returned when fetching blob 400 (Bad Request) Invalid status code returned when fetching blob 400 (Bad Request) ERRO[0000] error pulling image "docker://debian.tardis.local/ibazulic/alpine:latest": unable to pull docker://debian.tardis.local/ibazulic/alpine:latest: unable to pull image: Error parsing image configuration: Invalid status code returned when fetching blob 400 (Bad Request)
Quay logs the following:
nginx stdout | 192.168.20.1 () - - [23/Oct/2019:12:12:50 +0000] "GET /_storage_proxy/eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IjQyODQzYTEwYmJiZTk1NTJmMWFhNzRmMzE0YTU1OGUxODM5ZWQyOWNiZjQ4MWM3ZTNhNzk4NWRkMTRhNGVmZjkifQ.eyJhY2Nlc3MiOlt7Imhvc3QiOiJjZXBoLWNsaWVudDo3NDgwIiwic2NoZW1lIjoiaHR0cCIsInR5cGUiOiJzdG9yYWdlcHJveHkiLCJ1cmkiOiJteS1uZXctYnVja2V0L2RhdGFzdG9yYWdlL3JlZ2lzdHJ5L3NoYTI1Ni85Ni85NjE3Njk2NzY0MTFmMDgyNDYxZjllZjQ2NjI2ZGQ3YTJkMWUyYjJhMzhlNmE0NDM2NGJjYmVjZjUxZTY2ZGQ0P1NpZ25hdHVyZT1EemM0Q0lHODY2UnJIU3JTMkdta0I4Rmo4U3MlM0QmRXhwaXJlcz0xNTcxODMzMzcwJkFXU0FjY2Vzc0tleUlkPUs2RFROSDg1Mk5ZRENETlJTTVRGIn1dLCJjb250ZXh0Ijp7fSwiYXVkIjoiZGViaWFuLnRhcmRpcy5sb2NhbCIsImV4cCI6MTU3MTgzMjgwMCwiaXNzIjoicXVheSIsImlhdCI6MTU3MTgzMjc3MCwibmJmIjoxNTcxODMyNzcwLCJzdWIiOiJzdG9yYWdlcHJveHkifQ.e6-lAOUHZdcRVR8j04Q72H0oOQbStRJXIe-tLPsnvBPsjEDlWgw3bLFQrxeDm74B0LAfO7niF2naEsedIL3M3mwc5G97hIbnfMb_DOIojKyCvfdGTYLR8AHfi0kaNTdm0JAZraMosxGmPJ1WhokCC8ckmJfcnemmBT4Wv0YKFyscvXsHrRxH60RMW0_YUstZZ9JPYbKgh62KP-AY0C6tTKZj5xh6qDFvcoNGbN2Dlr8SQok0MVVBJqK2OK4nQeMRdMxxTrGzKj_Gc91tNoPf8vJotaQ3Q9_UdCiRhqziOLi2qtrFHli7QlVxnn8M0l6G6xJzeJMcsDSTPwddJlHcag/http/ceph-client:7480/my-new-bucket/datastorage/registry/sha256/96/961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4?Signature=Dzc4CIG866RrHSrS2GmkB8Fj8Ss%3D&Expires=1571833370&AWSAccessKeyId=K6DTNH852NYDCDNRSMTF HTTP/1.1" 400 188 "http://debian.tardis.local/v2/ibazulic/alpine/blobs/sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4" "libpod/1.6.1" (0.043 2621 0.002)
Ceph RADOS logs the following:
2019-10-23 13:12:50.522295 7fc1f840b700 1 ====== starting new request req=0x7fc1f8405650 ===== 2019-10-23 13:12:50.522529 7fc1f840b700 1 ====== req done req=0x7fc1f8405650 op status=0 http_status=400 ====== 2019-10-23 13:12:50.522572 7fc1f840b700 1 civetweb: 0x55bf79804000: 192.168.20.100 - - [23/Oct/2019:13:12:50 +0100] "GET /my-new-bucket/datastorage/registry/sha256/96/961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 HTTP/1.0" 400 0 http://debian.tardis.local/v2/ibazulic/alpine/blobs/sha256:961769676411f082461f9ef46626dd7a2d1e2b2a38e6a44364bcbecf51e66dd4 libpod/1.6.1
This is with FEATURE_PROXY_STORAGE: true. The same does not happen when the storage proxy feature is turned off. In that case, podman pulls work as expected. Same error happens when FEATURE_PROXY_STORAGE: true and the supporting storage backend is AWS, so it's not connected to Ceph directly. Again, the same does not happen when Docker is used. Proxying storage is only used with pulls, pushes work as intended.
Flags: needinfo?(ibazulic@redhat.com) Private Comment 25 Tom McKay 2019-10-29 14:54:34 UTC
It would be good to resolve this issue as it still fails on podman-1.6.2.
From our debug it appears as if podman is resending the wrong headers. We can provide a quay setup to work with. Docker client works successfully as expected.
Flags: needinfo?(mitr@redhat.com) Private Comment 26 Daniel Walsh 2019-10-29 15:12:11 UTC
We need more information from the quay server.
Miloslav asked for this here. https://bugzilla.redhat.com/show_bug.cgi?id=1695236#c21
Private Comment 27 jschorr@redhat.com 2019-10-29 15:20:15 UTC
I'm nearly 100% certain that the root cause is that listed here: https://bugzilla.redhat.com/show_bug.cgi?id=1695236#c14
Can someone try modifying podman to turn off the "resend the Auth header" behavior and see if it is fixes the problem?
Derrick Ornelas 2019-10-30 20:25:25 UTC CC: dornelas@redhat.com Private Comment 28 Ben Cotton 2019-10-31 18:42:50 UTC
This message is a reminder that Fedora 29 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '29'.
Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 29 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Ivan Bazulic 2019-11-08 17:15:06 UTC Version: 29 → 31 Derrick Ornelas 2019-11-19 23:38:59 UTC Blocks: Red Hat1774279 Private Comment 29 svelosol@redhat.com 2019-12-04 08:52:33 UTC
This bug makes impossible to use Openshift 4 with Quay in proxy mode as cri-o uses same library than Podman.
The workaround proposed by @Pavel works fine but it should not be handled by Quay nginx.
location ~ ^/_storage_proxy/([^/]+)/([^/]+)/([^/]+)/(.+) { ...
Fix Podman issue https://bugzilla.redhat.com/show_bug.cgi?id=1695236
}
CC: svelosol@redhat.com Private Comment 30 jschorr@redhat.com 2019-12-04 16:27:12 UTC
@svelosol@redhat.com
This change has already been merged in the Quay code for the next release, as it was becoming an issue for our customers. You are correct, though, that should ideally be handled on the podman side.