Open myitcv opened 2 years ago
Yes, in theory we could return as soon as we find the package at the most precise path. That short-circuit isn't optimized because the shorter paths are supposed to return quickly — so the question is, why did the proxy take 2+ seconds to tell you about a version that trivially doesn't exist?
so the question is, why did the proxy take 2+ seconds to tell you about a version that trivially doesn't exist?
Indeed. Plus the question above about the 15-20 secs delay we can't account for in the -x
output.
Checking if a version exists or not takes some measurable amount of time, and the proxy's implementation creates some overhead. I'm not sure what "trivially doesn't exist" means, but if we have to run a go command and it has to talk to an origin server, 2 seconds does not seem unreasonable to me.
@heschi
if we have to run a go command and it has to talk to an origin server, 2 seconds does not seem unreasonable to me.
Why would a go
command call be necessary here? Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?
FWIW, the go
command invocation in this case completes in ~100ms:
$ GOPROXY=direct time go list -m cuelang.org@v0.4.1-beta.4
go: cuelang.org@v0.4.1-beta.4: unrecognized import path "cuelang.org": parsing cuelang.org: XML syntax error on line 44: expected /> in element
Command exited with non-zero status 1
real 0:00.10
user 0.02
sys 0.01
rss 20944k
So that seems like 2+ seconds for container setup? Which I guess seems ok, but given the previous trace I would have expected the negative result to already be present in the cache.
Indeed. Plus the question above about the 15-20 secs delay we can't account for in the
-x
output.
Agreed. If that happens again, could you kill the go
command with SIGQUIT
and post the goroutine dump (or send it to me)?
Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?
Yes.
real 0:00.10
I don't believe this was done with an empty module cache. For me it takes anywhere from 300ms to 1.5s. But yes, there is quite a bit overhead proportional to the required work for something this small.
The behavior of the proxy is well within my expectations, so I'm going to drop out of the discussion now.
Thanks, Heschi. Very much appreciate the clarifications.
Because the proxy can't be sure that it's previously cached answer of "nothing there" is still valid or not?
Yes.
@bcmills I think this points towards there being some merit in the short-circuit described earlier?
@bcmills
could you kill the
go
command withSIGQUIT
and post the goroutine dump (or send it to me)?
https://gist.github.com/myitcv/b7939fcc511a007af00297ceb39847b3
I triggered this about 7 seconds after the last line of output shown (didn't want to wait too much longer in case I missed the opportunity). What's interesting is that in this time the precise path hasn't returned.
What's even more interesting is that these calls shouldn't be necessary because I have this module in my module cache.
Hmm. There isn't much actually running in that goroutine dump — one in a read
syscall via modfetch.(*dbClient).ReadCache
, and what appears to be a whole cluster of goroutines waiting on that same tile.
Nothing seems to have been blocked for a long time, so it isn't obvious to me whether we're bottlenecked on reading the module graph one time (or reading checksums one time), or reloading the graph (or reloading the subdb tiles) repeatedly for some reason.
A trace file from --debug-trace=go.trace
might be interesting to see, but it might just tell us some functions that need more detailing trace instrumentation. 😅
@bcmills - an update from my side. Turns out some of the issues I have been seeing are related to local disk errors (thankfully not actual disk errors, just problems with a NVM disk and the Linux kernel I am using).
So I'm going to attribute the unexplained long delays (15-20secs) to those disk errors. This is reasonable and consistent with other behaviour I was able to reliably reproduce.
That leaves two issues to my mind (hence I've retitled the issue):
1) why go get cuelang.org/go@v0.4.1-beta.4
in a new module is hitting the network at all when that exact version is available in my local module cache
2) whether cmd/go get
should short-circuit when it gets a response from a more precise module path (per discussion above). Because when the proxy hasn't seen a request for cuelang.org/go
"for some time" it ends up hitting the remote VCS to check whether a module exists at the path cuelang.org
, and the client's cmd/go
instance blocks until that returns. This delay can be measured in seconds, and exacerbates point 1.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The command stalled at the line
**********************
for a good 15-20 secs.What did you expect to see?
-x
logs to indicate where the delay occurred; the 15-20 secs delay I experienced is not accounted for aboveWhat did you see instead?
As abve.
Noting a subsequent call had output as follows:
Here the
go get
command ultimately took as long as it did (~2.5 secs) because of the long response time to:Given
go get
already by this point got a successful response from:could
cmd/go
not optimise to simply "drop" the other requests, given the most specific one returned successfully?cc @bcmills for
cmd/go
and @heschi in case there is any proxy-related issues here