GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
869 stars 332 forks source link

gsutil consuming large amounts of memory during downloads of large files #503

Open berlincount opened 6 years ago

berlincount commented 6 years ago

While downloading a large file to local storage on a GKE (Kubernetes) PVC (ssd), we noticed the gsutil process causing the containers to be killed on a regular basis.

The container is limited to 24 GiB of RAM use, which should be fine to download a 230GiB file. It isn't, as sometimes the gsutil process grows way beyond this.

It's noteworthy that apparently storage sometimes can't keep up, which might cause gsutil to buffer the data without providing any back pressure in form of bandwidth managing or pausing and resuming the incoming data transfer.

Example session in mid transfer:

# gsutil -v
gsutil version: 4.28
# ps afux
root       407  0.0  0.0 111188 66936 ?        S    12:53   0:00          \_ python2 /usr/bin/../lib/google-cloud-sdk/bin/bootstrapping/gsutil.py cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       412  4.3  0.0 305116 34780 ?        Sl   12:53   1:56              \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       416  0.0  0.0 570092 25928 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       417  9.2  0.0 12020364 31872 ?      Sl   12:53   4:08                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       527  201  0.0 452068 34824 ?        Sl   12:53  90:41                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       528  0.0  0.0 451812 29856 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       529  0.0  0.0 451812 29856 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       531  0.0  0.0 451812 29856 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       533  0.0  0.0 451812 29856 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       534  0.0  0.0 451812 29856 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       536  0.0  0.0 451812 29864 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       538  0.0  0.0 451812 29868 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       541  0.0  0.0 451812 29868 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       542  0.0  0.0 451812 29872 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       545  0.0  0.0 451812 29876 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       547  0.0  0.0 451812 29876 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       550  0.0  0.0 451812 29884 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       552  0.0  0.0 451812 29884 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       554  0.0  0.0 451944 29884 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       557  0.0  0.0 451944 29888 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       560  0.0  0.0 451944 29888 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       561  0.0  0.0 451944 29892 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       563  0.0  0.0 451944 29892 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       566  0.0  0.0 451944 29892 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       569  0.0  0.0 451944 29896 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       572  0.0  0.0 451944 29896 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       575  0.0  0.0 451944 29900 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       578  0.0  0.0 451944 29888 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       582  0.0  0.0 451944 29904 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       585  0.0  0.0 451944 29896 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       586  0.0  0.0 451944 29776 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       588  0.0  0.0 451944 29784 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       590  0.0  0.0 451944 29784 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       591  0.0  0.0 451944 29788 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       593  0.0  0.0 451948 29788 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
root       596  0.0  0.0 451948 29788 ?        Sl   12:53   0:00                  \_ /usr/bin/python2 /usr/lib/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=shopify-tiers cp gs://examplebucket/examplefile-230GiB.tar.zst /data/mysql/tmp/restore.tar.zst
houglum commented 6 years ago

After trying a large parallel download myself and monitoring it with top, plus reviewing your output above, I don't think memory usage is your problem. The 4th column (which should be %MEM) shows 0.0 all the way down that column.

I have heard comments from a couple coworkers about Docker not liking it when they run gsutil -m cp ... commands, especially with the default settings for -m, which tend to lots of extra processes and threads. You can limit the number of processes and threads by setting the parallel_process_count and parallel_thread_count (note that this is number of threads per process) attributes in your ~/.boto file to something much smaller to ensure you use fewer resources. Alternatively, you could ensure only one process is used by omitting the -m flag from your gsutil command.

berlincount commented 6 years ago

Well, it looks like memory use to the Kubernetes container, so it gets killed due to memory use :(

Theres only one file being downloaded there as well ...

berlincount commented 6 years ago

Lowering the parallel_process_count and parallel_thread_count seemed to help a bit, at least while at the same time using https://github.com/Feh/nocache ..