GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
875 stars 335 forks source link

IOERROR: operation not permitted on rsync #495

Open delgadom opened 6 years ago

delgadom commented 6 years ago

I'm trying to use rsync to upload from a read-only filesystem on an external disk to a google cloud bucket. The process worked for a handful of folders, but after successfully uploding thousands of files has begun consistently failing for one of my directories (total contents: 40 GB). I tried switching to another similarly sized directory and it worked for a while but is now failing as well.

I've tried uninstalling the google cloud SDK but I get the same error. I've tried poking around in the files mentioned by the stack trace but haven't been able to find anything. This has been going on for a couple days, so it's not a rate limit issue.

I've upgraded my account to a full paid plan and have a card attached, so I don't think it's a payments issue. Is there some maximum bucket contents size setting somewhere that I've missed?

At the very least, a more descriptive error message would be really helpful. Thanks!

$ gsutil rsync -d -r /Volumes/ExtHD/my_directory/ gs://mike_personal_backups/my_directory
Building synchronization state...
Starting synchronization
Copying from named pipe...
Traceback (most recent call last):
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gsutil", line 22, in <module>
    gsutil.RunMain()
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gsutil.py", line 114, in RunMain
    sys.exit(gslib.__main__.main())
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 381, in main
    perf_trace_token=perf_trace_token, user_project=user_project)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 567, in _RunNamedCommandAndHandleExceptions
    user_project=user_project)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 319, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1425, in RunCommand
    fail_on_error=True, seek_ahead_iterator=seek_ahead_iterator)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1372, in Apply
    arg_checker, should_return_results, fail_on_error)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1443, in _SequentialApply
    worker_thread.PerformTask(task, self)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2109, in PerformTask
    results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1215, in _RsyncFunc
    preserve_posix=cls.preserve_posix_attrs)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 3306, in PerformCopy
    allow_splitting=allow_splitting)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 1803, in _UploadFileToObject
    dst_obj_metadata, preconditions, gsutil_api)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 1496, in _UploadFileToObjectNonResumable
    fields=UPLOAD_RETURN_FIELDS)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 293, in UploadObjectStreaming
    encryption_tuple=encryption_tuple, fields=fields)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1309, in UploadObjectStreaming
    apitools_strategy=apitools_transfer.RESUMABLE_UPLOAD, total_size=None)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1158, in _UploadObject
    additional_headers, progress_callback)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1229, in _PerformResumableUpload
    additional_headers=addl_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 930, in StreamInChunks
    additional_headers=additional_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 883, in __StreamMedia
    additional_headers=additional_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 983, in __SendChunk
    self.stream, start, self.chunksize)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/buffered_stream.py", line 34, in __init__
    self.__buffered_data = self.__stream.read(size)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/hashing_helper.py", line 412, in read
    data = self._orig_fp.read(size)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/resumable_streaming_upload.py", line 128, in read
    new_data = self._orig_fp.read(bytes_remaining)
IOError: [Errno 1] Operation not permitted

Now attempting the same operation on a different folder. It worked for thousands of files, but is now failing:

$ gsutil rsync -d -r /Volumes/ExtHD/my_other_directory/ gs://mike_personal_backups/my_other_directory
Building synchronization state...
At source listing 10000...
At source listing 20000...
At source listing 30000...
At source listing 40000...
At source listing 50000...
At source listing 60000...
Starting synchronization
Copying from named pipe...
Traceback (most recent call last):
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gsutil", line 22, in <module>
    gsutil.RunMain()
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gsutil.py", line 114, in RunMain
    sys.exit(gslib.__main__.main())
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 381, in main
    perf_trace_token=perf_trace_token, user_project=user_project)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 567, in _RunNamedCommandAndHandleExceptions
    user_project=user_project)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 319, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1425, in RunCommand
    fail_on_error=True, seek_ahead_iterator=seek_ahead_iterator)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1372, in Apply
    arg_checker, should_return_results, fail_on_error)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1443, in _SequentialApply
    worker_thread.PerformTask(task, self)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/command.py", line 2109, in PerformTask
    results = task.func(cls, task.args, thread_state=self.thread_gsutil_api)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1215, in _RsyncFunc
    preserve_posix=cls.preserve_posix_attrs)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 3306, in PerformCopy
    allow_splitting=allow_splitting)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 1803, in _UploadFileToObject
    dst_obj_metadata, preconditions, gsutil_api)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py", line 1496, in _UploadFileToObjectNonResumable
    fields=UPLOAD_RETURN_FIELDS)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 293, in UploadObjectStreaming
    encryption_tuple=encryption_tuple, fields=fields)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1309, in UploadObjectStreaming
    apitools_strategy=apitools_transfer.RESUMABLE_UPLOAD, total_size=None)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1158, in _UploadObject
    additional_headers, progress_callback)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 1229, in _PerformResumableUpload
    additional_headers=addl_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 930, in StreamInChunks
    additional_headers=additional_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 883, in __StreamMedia
    additional_headers=additional_headers)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/transfer.py", line 983, in __SendChunk
    self.stream, start, self.chunksize)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/buffered_stream.py", line 34, in __init__
    self.__buffered_data = self.__stream.read(size)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/hashing_helper.py", line 412, in read
    data = self._orig_fp.read(size)
  File "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/resumable_streaming_upload.py", line 128, in read
    new_data = self._orig_fp.read(bytes_remaining)
IOError: [Errno 1] Operation not permitted
houglum commented 6 years ago

I assume this is on a machine running Mac OSX? The only times I've seen this error are in such cases where the OS is trying to prevent users from messing with the operating system's important files (as mentioned in this StackOverflow thread) via SIP.

Could you try throwing in a debugging statement to print out the name of the problematic file? E.g. at the top of the _UploadFileToObject method in "/Users/delgadom/google-cloud-sdk/platform/gsutil/gslib/copy_helper.py" (defined at line 1803, based on your stack trace), you could add the statement print(src_url.object_name). From there, you should be able to investigate the file and see if it's secured via special permissions.

delgadom commented 6 years ago

Thanks for the response @houglum! I am indeed on OSX : High Sierra, but the file shouldn't be a protected system file. After inserting that debug file, it turned out to be:

/Volumes/Seagate Expansion Drive/Michael Delgado/Mike/Archive/Stanford/2009-2010/Earth Systems Research/Earth Systems Research/Icon_

Not sure what that file is, but it doesn't seem like it could be a core system file.

Anyway, I wrapped that block in a try: ... except IOError: return, and now it's humming along just fine. probably not a great idea, but it's working!

System specs:

OSX High Sierra version 10.13.2 (17C88)
MacBook Pro (13-inch, 2017, Two Thunderbolt 3 ports)
Processor 2.3 GHz Intel Core i5
Memory 8 GB 2133 MHz LPDDR3

Python 2.7.14 |Anaconda, Inc.| (default, Oct  5 2017, 02:28:52)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin

gsutil version: 4.28
delgadom commented 6 years ago

Huh - the other folder was failing on a similar file:

/Volumes/Seagate Expansion Drive/Michael Delgado/Pictures/alaska/Mike's Phone/Icon_

srcc-chekh commented 6 years ago

Yes, we have to add things like "-x '(.gnupg|.cache|.*Icon\r$)'" to our gsutil rsync...