Open hodgesmr opened 6 years ago
The error message above comes from Python's multiprocessing
module -- specifically, the Manager
class. It looks like it's unable to connect to the Manager's server process to create a Lock object. This seems like a pretty low-level Python module issue, rather than with anything gsutil is doing; we don't do anything special to alter the behavior of the Manager class, so this should "just work" ™. I assume this is a fundamental issue with the Multiprocessing
module when used within specific containerized environments... but I don't have anything to base that on except the stack trace above and the fact that I've only seen this problem occur in something running within a Docker container.
For thoroughness, would you mind running gsutil version -l
within the container and posting that output? This should give lots of output, but I'm mainly interested in the Python version and gsutil version being used, along with some metadata about the environment gsutil is invoked from.
@houglum thanks for the response!
Here's the output from gsutil version -l
running in the container (run before authing with gcloud
):
gsutil version: 4.28
checksum: ca9bccbeb7ce0c439a9cfdf998a08dd0 (OK)
boto version: 2.48.0
python version: 2.7.9 (default, Jun 29 2016, 13:08:31) [GCC 4.9.2]
OS: Linux 4.4.0-1027-gke
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: False
config path(s): no config found
gsutil path: /usr/lib/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
Thanks, hodgesmr@. You're on python 2.7.9 and the most recent version of gsutil, so I'll stick to my original guess :)
A good way of attempting to confirm this might be to write a script that simply creates a multiprocessing.Manager
, then attempts to create a Lock within it (ideally, adding some timing to gsutil in the stack trace points above, to see how far apart the Manager and Lock creation occur so you can mimic that) -- if you can deploy and run that in a similar container environment and occasionally reproduce this, it's most certainly an issue with Multiprocessing on Docker/Kubernetes.
I've also seen this on occasion when using tmux. I have not found any kind of pattern for consistent reproducibility, though.
Issue Description
We're attempting to use
gsutil
to download files as part of our DevOps flow. We have gzipped tar archives in a GCS bucket and we're spinning up a docker container in kubernetes to pull down the archives.Periodically, the
gsutil cp
command will raise an exception:We're authenticating with a service account json file:
and then attempting to download:
I can run this repeatedly on the same file in the same bucket. Sometimes it works fine, other times it raises the above exception.
More Details
Our
Dockerfile
installs thegoogle-cloud-sdk
like so:Our entrypoint is a simple shell script that calls
gsutil cp
(as described above):When this exception occurs, the container does not terminate.
This is run via a Spinnker pipeline which deploys the container into our Kubernetes cluster with the following manifest: