Open syedriko opened 7 months ago
does it show it constantly off or is it a temporary peak value?
does it show it constantly off or is it a temporary peak value?
The values change, I've seen them fluctuate between 300 MiB/s and 1.1 GiB/s, but the entire range is unrealistic.
I see the same issue with Podman 4.7:
Copying blob 21b98c2cb85e [========>-----------------------------] 25.0MiB / 100.0MiB | 1.3 GiB/s
I think we introduced the change in https://github.com/containers/image/pull/2015. Since pull
stats are fine, I assume that we calculate something different when doing an image push
.
Buildah is affected as well, so I think this one is a more general issue.
I see the same. It's reporting over 100MiB/s for a large image. My internet upload is only ~40 MB/s for upload.
Unfortunately I had no further time spent on this. @mtrmac do you have a rough guess where the issue may come from?
I didn’t investigate in detail now, just some quick points:
mpb
progress bar code is doing, and possibly whether the computation is tunable (to be smoothed, or to explicitly take / not take unrelated waits into account).@mtrmac both of your points are correct. In short this is happening because ewma based decorator is being used which goal is to measure “instantaneous” throughput. You just hit an edge case as I'm pretty sure it works ok with pull (download) and not ok with push (upload). Most of the time download operation involves slow reader and upload slow writer. It means:
ewma
with ProxyReader combo works ok on download (slow reader)ewma
with ProxyReader combo works not ok on upload (slow writer)You have several options to fix this. The most quick and simple is to use average
based decorators instead. But if you'd like to maintain ewma
based one so here are the options:
ProxyReader
around.ewma
weight value. The greater the value is the more will measurement behave like an average one.You don't have to pick just one approach. I would handle push case with ProxyWriter
and use 90 as ewma weight value instead of 30 (but you'd better check end result and tune accordingly). Underlying ewma lib treat 30 as special case which defaults to no need for WARMUP_SAMPLES variant.
Last but not least, manual bar increments require use of Ewma
prefixed variants in order for ewma based decorators to work correctly (ProxyReader and ProxyWriter handle this automatically).
@vbauerster Thank you very much for the advice!
To reformulate in simple terms: reading the ProgressReader
implementation, the throughput being recorded is only the time spent in Read()
. So, in the ”slow consumer” situation, we spend a lot of time between each Read
call and very little time inside the Read
call.
In that case, the throughput computed is not just “within a factor of 2 or so”, but it can be many times the bandwidth of the link. I think that’s the full explanation, with no mysteries left.
Our pipeline design is such that the pipeline is actually executed “inside the destination”, and ultimately the slow write calls are only observed by that destination. In particular, pushes use the Go HTTP stack, where the input is provided as an io.Reader
; we don’t have access to the Write
-like operations that happen on the network socket deep inside http.Transport
. That seems pretty hard to integrate with the ProxyWriter
code measuring time spent.
One possible ~general solution for us might be to replace ProxyReader
with a custom implementation which does not measure just the time inside Read
, but all of the time since the previous Read
finished. That would, I think, end up reporting the effective throughput of the pipeline — but still in the terms of the uncompressed input, so within a factor or ~2 of the network throughput.
An additional complication for push is that we know the size of the input, but we don’t know the size of the to-be-compressed output. So we would, ideally, want to report the progress percentage in terms of the uncompressed data, but throughput in terms of the compressed data — both on one progress bar line.
I’m sure that can be built (completely from scratch, or maybe by somehow providing a custom BarFiller
), but it’s not something I plan to immediately work on.
Last but not least, manual bar increments require use of
Ewma
prefixed variants in order for ewma based decorators to work correctly (ProxyReader and ProxyWriter handles this automatically).
Thanks! https://github.com/containers/image/pull/2308 should fix that.
To reformulate in simple terms: reading the
ProgressReader
implementation, the throughput being recorded is only the time spent inRead()
. So, in the ”slow consumer” situation, we spend a lot of time between eachRead
call and very little time inside theRead
call.
Correct, the opposite is true for ProxyWriter
implementation where time spent only in Write([]byte)
is recorded.
One possible ~general solution for us might be to replace
ProxyReader
with a custom implementation which does not measure just the time insideRead
, but all of the time since the previousRead
finished. That would, I think, end up reporting the effective throughput of the pipeline — but still in the terms of the uncompressed input, so within a factor or ~2 of the network throughput.
If you don't have easy access to a writer then such ~general solution is very plausible. Worth to mention one more time that average decorators just work as expected most of the time.
I think if we spend 10 minutes uploading a layer, and then the upload stalls, we would want to converge to showing a 0 throughput reasonably quickly; not reporting half the throughput after another 10 minutes, quarter after another 20 minutes.
But then again, in most situations pushes complete well within a minute, or maybe within a few; so that might not be worth worrying about too much.
I think if we spend 10 minutes uploading a layer, and then the upload stalls, we would want to converge to showing a 0 throughput reasonably quickly; not reporting half the throughput after another 10 minutes, quarter after another 20 minutes. But then again, in most situations pushes complete well within a minute, or maybe within a few; so that might not be worth worrying about too much.
From a mildly-informed onlooker's perspective who noticed the currently busted progress indications, this is indeed an edge case, but it's also important. When upload pace is consistent, the algorithm hardly matters - ewma, average, etc will all look the same for constant inputs.
The biggest value of a progress bar is for longer operations (many minutes) to reflect an overall change in the shape of the operation - "the first half went at 10MB/s and the second half is now at 200KB/s" - so while I appreciate that there is only finite time to spend on niceties like this, I do not think you're worrying about it too much at all, since these edge cases are where the updating display is most useful :)
In that case, the throughput computed is not just “within a factor of 2 or so”, but it can be many times the bandwidth of the link. I think that’s the full explanation, with no mysteries left.
Agreed. I see a fairly constant 800 MiB/s - 1.2 GiB/s value for a 272MiB layer that has been in the process of pushing for 3 Minutes before it breaks 150MiB transfered.
Glad I found this ticket, I was starting to think I was going a little stir crazy :smile:
I'm seeing order of magnitude issues. I'm seeing 1.2GiB/s when in reality the network stack is running at 12Mbps (~1.1MiB/s).
podman-4.9.3-1.fc39.x86_64
Looks to be a factor of ~1024, built from b8a684b64d77fc97b3b539dd5891319756c15c39
The same issue (wrong and fluctuating throughput) is also present for pulls. Fixing one might likely also fix the other, but I wanted to make a note of it.
Issue Description
When pushing an image to quay.io, podman reports unrealistic throughput numbers:
I've seen it report up 1.1 GiB/s, but failed to capture. My upload bandwidth is 20 Mbps as reported by https://speedtest.xfinity.com/results so these numbers are not realistic, something is off with arithmetic there.
Steps to reproduce the issue
Steps to reproduce the issue
Describe the results you received
podman reported image upload speed of 350.0 MiB/s
Describe the results you expected
podman reports image upload speed within my cable modem's upload bandwidth
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting