GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
872 stars 332 forks source link

gsutil rsync fails with `Caught non-retryable exception - aborting rsync` #1356

Open nerusnayleinad opened 2 years ago

nerusnayleinad commented 2 years ago

We have some files in GCS that we need to have them synchronized with S3, in AWS, but when I run gsutil rsync, it fails with Caught non-retryable exception - aborting rsync.

What we need to do is to run rsync every day so any files that has changed in GCS gets synchronized with AWS. When I run the command for the first time it copies the files correctly, but when I run it next time it fails.

So, this is the error I get when I run the command bucket vs bucket. Then I debugged it (with -D), and it looks like it fails on a specific file, which is directly in the bucket (no in a folder). The debug error message is the following:

 crc32c: 'YW2ZwA=='
 generation: 1605913220800799
 md5Hash: 'uDEOva2LJsCKzE7U6vv0DQ=='
 name: 'file.jar'
 size: 10903598
 timeCreated: datetime.datetime(2020, 11, 20, 23, 0, 20, 800000, tzinfo=<apitools.base.protorpclite.util.TimeZoneOffset object at 0x7f7daf39df98>)>]
 prefixes: []>
DEBUG: Exception stack trace:
    Traceback (most recent call last):
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 639, in _RunNamedCommandAndHandleExceptions
        user_project=user_project)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 410, in RunNamedCommand
        return_code = command_inst.RunCommand()
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1674, in RunCommand
        diff_iterator = _DiffIterator(self, src_url, dst_url)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1007, in __init__
        raise CommandException('Caught non-retryable exception - aborting rsync')
    gslib.exception.CommandException: CommandException: Caught non-retryable exception - aborting rsync

CommandException: Caught non-retryable exception - aborting rsync

What I did next is to run the commnd agaist that specific file, and this is the error I get:

 items: [<Object
 acl: []
 name: 'file.jar'>]
 prefixes: []>
DEBUG: Exception stack trace:
    Traceback (most recent call last):
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 639, in _RunNamedCommandAndHandleExceptions
        user_project=user_project)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 410, in RunNamedCommand
        return_code = command_inst.RunCommand()
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1642, in RunCommand
        src_url = self._InsistContainer(self.args[0], False)
      File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 1625, in _InsistContainer
        'slash to specify the directory.' % url_str)
    gslib.exception.CommandException: CommandException: arg (gs://BUCKET-NAME/FILE.jar) does not name a directory, bucket, or bucket subdir.
    If there is an object with the same path, please add a trailing
    slash to specify the directory.

CommandException: arg (gs://BUCKET-NAME/FILE.jar) does not name a directory, bucket, or bucket subdir.
If there is an object with the same path, please add a trailing
slash to specify the directory.

The file name is SOMETHING-SOMETHING-2.0.0-SNAPSHOT.jar, so there are no special characters.

gsutil version: 4.65 boto version: 2.49.0 python: 3.7.3

I am not being able to go any further.

nerusnayleinad commented 2 years ago

Can anyone look into this?

UPDATE

I have upgraded gsutil to 5.3, and realized it fails with absolutely any file I try to rsync for the second time.

If I copy the file (gsutil cp) it works, but we can't copy every time all the files. We need to move only the new files.

dilipped commented 2 years ago

Are you running the command on the file? Rsync requires that both source and destination are either directory, bucket or bucket subdir, just like the error mentions.

nerusnayleinad commented 2 years ago

Oh... I did run it on a specific file, and it fails all the time, but the original idea is to run it on a bucket, which still fails. Simething like this:

suren_danielyan@gcp-aws-migration-west3:~$ gsutil -m rsync -r gs://SOURCE-BUCKET s3://DESTINATION-BUCKET Building synchronization state... At source listing 10000... At source listing 20000... At source listing 30000... Caught non-retryable exception while listing s3://test-instore/: BadRequestException: 400 None At source listing 40000... At source listing 50000... At source listing 60000... At source listing 70000... At source listing 80000... At source listing 90000... At source listing 100000... At source listing 110000... At source listing 120000... At source listing 130000... At source listing 140000... At source listing 150000... At source listing 160000... At source listing 170000... At source listing 180000... At source listing 190000... At source listing 200000... At source listing 210000... At source listing 220000... At source listing 230000... At source listing 240000... At source listing 250000... At source listing 260000... At source listing 270000... At source listing 280000... At source listing 290000... At source listing 300000... At source listing 310000... At source listing 320000... At source listing 330000... At source listing 340000... At source listing 350000... At source listing 360000... At source listing 370000... At source listing 380000... At source listing 390000... CommandException: Caught non-retryable exception - aborting rsync

vojkny commented 2 years ago

🙏 this has been pain :/

chetanamacherla1 commented 2 years ago

Any update on this?

antouanbg commented 2 years ago

Hi guys, any solution - I have same problem today! Any help?

nerusnayleinad commented 2 years ago

My specific issue was that our client had object names that were conflicting with gsutil rsync: https://cloud.google.com/storage/docs/naming-objects

In that scenario, if you have way too many objects, there is no solution. Our client had over a million objects, and it was impossible to change the names of the ones that were conflicting.

I used rclone, which by the way is an amazing tool for this task. It has been written in go (unlike gsutil, which is python), so all the memory management was flawless.

JShollaj commented 1 year ago

Personally, I had a careless mistake that led me to that error. First I ran:

gsutil -m rsync -r -d -n gs://old_bucket s3://new_bucket

Then I checked that I needed to configure the aws cli within the gcloud cli (both required the setting up the credentials and security ID). Next, I re-ran the code above and confirmed the files would get copied.

As a final step I ran:

gsutil -m rsync -r -d gs://old_bucket s3://new_bucket