GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
864 stars 331 forks source link

`copy_helper_opts.read_args_from_stdin` code means that `cat retrieve-files | gsutil cp -I . -L log` != `cat retrieve-files | gsutil cp -L log -I .` #1785

Open jsoref opened 1 month ago

jsoref commented 1 month ago
  -I             Use ``stdin`` to specify a list of files or objects to copy. You can use
                 gsutil in a pipeline to upload or download objects as generated by a program.
                 For example:

                   cat filelist | gsutil -m cp -I gs://my-bucket

                 where the output of ``cat filelist`` is a one-per-line list of
                 files, cloud URLs, and wildcards of files and cloud URLs.
  -L <file>      Outputs a manifest log file with detailed information about
                 each item that was copied. This manifest contains the following
                 information for each item:

                 - Source path.
                 - Destination path.
                 - Source size.
                 - Bytes transferred.
                 - MD5 hash.
                 - Transfer start time and date in UTC and ISO 8601 format.
                 - Transfer completion time and date in UTC and ISO 8601 format.
                 - Upload id, if a resumable upload was performed.
                 - Final result of the attempted transfer, either success or failure.
                 - Failure details, if any.

                 If the log file already exists, gsutil uses the file as an
                 input to the copy process, and appends log items to
                 the existing file. Objects that are marked in the
                 existing log file as having been successfully copied or
                 skipped are ignored. Objects without entries are
                 copied and ones previously marked as unsuccessful are
                 retried. This option can be used in conjunction with the ``-c`` option to
                 build a script that copies a large number of objects reliably,
                 using a bash script like the following:

                   until gsutil cp -c -L cp.log -r ./dir gs://bucket; do
                     sleep 1
                   done

                 The -c option enables copying to continue after failures
                 occur, and the -L option allows gsutil to pick up where it
                 left off without duplicating work. The loop continues
                 running as long as gsutil exits with a non-zero status. A non-zero
                 status indicates there was at least one failure during the copy
                 operation.

                 NOTE: If you are synchronizing the contents of a
                 directory and a bucket, or the contents of two buckets, see
                 "gsutil help rsync".

Nothing here appears to say you must place -L ... before -I ..., but the condition below forces it:

https://github.com/GoogleCloudPlatform/gsutil/blob/a32d8f5596c256c0de3ac60810278bd68738751f/gslib/commands/cp.py#L1066-L1068

When run in one order, things work fine, in the other, one gets:

CommandException: Source URLs cannot be specified with -I option