GoogleCloudPlatform / gsutil

A command line tool for interacting with cloud storage services.
Apache License 2.0
875 stars 336 forks source link

rsync - release unlocked lock #568

Closed almirb closed 6 years ago

almirb commented 6 years ago

Hi! I'm trying to run gsutil to sync our local storage with Google Bucket... but It always ends with release unlocked lock exception... The proccess continues until It ends the source listing operation... Then It finishes without copying the files.

$ gsutil -m rsync -C -r -n -x ".*\.bak$.*\.db$.*\.bip$.*~lock\.*$.acad\.lsp$" "/cygdrive/g/PMRS/Documentos/" "gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/" |& tee log_rsync_planejamento.txt
Building synchronization state...
At destination listing 10000...
At source listing 10000...
At destination listing 20000...
At destination listing 30000...
At source listing 20000...
At destination listing 40000...
At source listing 30000...
At destination listing 50000...
At destination listing 60000...
At source listing 40000...
At destination listing 70000...
At destination listing 80000...
At source listing 50000...
Caught non-retryable exception while listing gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/: release unlocked lock
At source listing 60000...
At source listing 70000...

How can I remove these locks? Thanks!

almirb commented 6 years ago

I noticed that when adding -o "GSUtil:parallel_thread_count=1" to the command line avoided the problem. But the speed is slow using only 1 thread.

I'm running gsutil in a cygwin64 terminal, in a Windows 2008 R2 machine. I need to use cygwin because of Windows file/folder encoding problems with accentuation and other brazilian special characters (ã, õ. ç...).

Is there another way to avoid the lock problem without reducing the number of threads to 1?

mfschwartz commented 6 years ago

Hi Almir. Can you try to collect some additional detail about where this problem happened? Could you try adding the following before the cls.logger.error statement at https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/commands/rsync.py#L634 :

import traceback traceback.print_exc()

This should cause gsutil to print a stacktrace when the problem happens, which will hopefully provide a hint about where the lock problem happened.

Thanks,

Mike

almirb commented 6 years ago

Of course @mfschwartz !

Here It is:

$ gsutil -m rsync -C -r -n "/cygdrive/g/PMRS/Documentos/SEC_PLANEJAMENTO" "gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/SEC_PLANEJAMENTO/" |& tee log_rsync_planejamento.txt
Building synchronization state...
At destination listing 10000...
At source listing 10000...
At destination listing 20000...
At source listing 20000...
At destination listing 30000...
Traceback (most recent call last):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 588, in _ListUrlRootFunc
    out_file)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 804, in _BatchSort
    current_chunk = sorted(islice(in_iter, buffer_size))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 655, in _FieldedListingIterator
    for blr in iterator:
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 490, in IterObjects
    expand_top_level_buckets=True):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 215, in __iter__
    provider=self.wildcard_url.scheme, fields=listing_fields):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 608, in ListObjects
    global_params=global_params)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 1223, in List
    config, request, global_params=global_params)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
    self.__ProcessHttpResponse(method_config, http_response, request))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 620, in __ProcessHttpResponse
    return self.__client.DeserializeMessage(response_type, content)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 446, in DeserializeMessage
    message = encoding.JsonToMessage(response_type, data)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 123, in JsonToMessage
    return _ProtoJsonApiTools.Get().decode_message(message_type, message)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 309, in decode_message
    message_type, result)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 213, in decode_message
    message = self.__decode_dictionary(message_type, dictionary)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 286, in __decode_dictionary
    for item in value]
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 331, in decode_field
    field.message_type, json.dumps(value))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 309, in decode_message
    message_type, result)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 213, in decode_message
    message = self.__decode_dictionary(message_type, dictionary)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 293, in __decode_dictionary
    setattr(message, field.name, self.decode_field(field, value))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 325, in decode_field
    result = decoder(field, value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/extra_types.py", line 261, in _DecodeDateTimeField
    message_types.DateTimeField(1), value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 329, in decode_field
    return util.decode_datetime(value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/util.py", line 267, in decode_datetime
    decoded_datetime = datetime.datetime.strptime(time_string, format_string)
  File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
    _regex_cache[format] = format_regex
error: release unlocked lock
Caught non-retryable exception while listing gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/SEC_PLANEJAMENTO/: release unlocked lock

I hope that can help.

mfschwartz commented 6 years ago

Hi Almir,

Thanks for the stacktrace. The code that created that lock is part of the low-level C implementation of the standard Python thread module. I found this documentation that explains what's going wrong (https://docs.python.org/2/library/threading.html):

"The release() method should only be called in the locked state; it changes the state to unlocked and returns immediately. If an attempt is made to release an unlocked lock, a ThreadError will be raised."

So it appears that some part of the code is attempting to release a lock that was previously released. I'm not sure how that could be, because the lock is acquired and released via a Python 'with' clause

Is there any way you could provide me with a directory of files that reproduces this problem? I'm hoping you have some non-proprietary / public data you would be able to share, so I can trace through it happening.

One other thing: Can you please send me the output of running:

gsutil ver -l

That will give me details about your OS, Python version, etc. I'm wondering if this might be some platform-specific problem (WIndows, Alpine Linux, etc.)

If you'd prefer to interact about any of this privately (so it's not posted on the public gsutil issues) you can email gs-team@google.com, and mention my name in the email.

Thanks,

Mike Schwartz

almirb commented 6 years ago

Hello!

Here is the return of gsutil ver -l command:

$ gsutil ver -l
gsutil version: 4.34
checksum: 4c5aca03f3a10471c0d6369a98804a5f (!= c1f79c206b3e0b79cbf957238d49851f)
boto version: 2.48.0
python version: 2.7.14 (default, Oct 31 2017, 21:12:13) [GCC 6.4.0]
OS: CYGWIN_NT-6.1 2.11.1(0.329/5/3)
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /home/administrador/.boto, /home/administrador/.config/gcloud/legacy_credentials/some_email@provider.com/.boto
gsutil path: /home/Administrador/google-cloud-sdk/bin/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False

I saw a PR correcting the "only unlock if It's locked" behavior in other Python package...
I'll search for It!

Thanks!

almirb commented 6 years ago

Found It.. https://github.com/mostafaeweda/pythonVSCode/commit/48b4d843967da70cc60db7a385e961b028500b09

mfschwartz commented 6 years ago

I don't think that PR applies in this case, because the problem happens in the low-level thread implementation.

I found this: https://bugs.python.org/issue34486 That one was reported under Python 3.6 and involved concurrently receiving a signal, so probably isn't the same issue, but it at least sounds related.

I wonder if there's a problem with the thread implementation shipping with Cygwin python. Could you try running the same rsync command on Windows without using Cygwin? (I personally also prefer using Cygwin rather than the Windows shell, but trying this would help figure out if it's an OS-specific problem.)

And one more thing worth trying: Can you try the command on Cygwin without the -m option? I'm wondering if this is somehow related to multiprocessing.

almirb commented 6 years ago

Here is the output outside Cygwin:

C:\Users\administrador.PMRS>gsutil -m rsync -C -n -r  "G:\PMRS\Documentos\" "gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/"

Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

Building synchronization state...
Caught non-retryable exception while listing file://G:\PMRS\Documentos: CommandException: Invalid Unicode path encountered
('G:\\PMRS\\Documentos\\ACOMPANHAMENTO
GR\xc1FICOS\\Arrecada\xe7\xe3o ISS RS 2.xls'). gsutil cannot proceed
with such files present. Please remove or rename this file and try
again. NOTE: the path printed above replaces the problematic
characters with a hex-encoded printable representation. For more
details (including how to convert to a gsutil-compatible encoding) see
`gsutil help encoding`.
At destination listing 10000...
At destination listing 20000...
At destination listing 30000...
At destination listing 40000...

I have first tried running gsutil from cmd.exe, but I couldn't escape from Windows encoding problems.. :(

Now the output of testing inside Cygwin, but without -m option:

$ gsutil rsync -C -r -x '.*\.bak$|.*\.db$|.*\.bip$|^\.~lock.*$|.*\.lsp$|^~.*$|.DS_Store$' "/cygdrive/g/PMRS/Documentos/SEC_PLANEJAMENTO" "gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/SEC_PLANEJAMENTO/" |& tee log_rsync_planejamento3.txt
Building synchronization state...
At destination listing 10000...
At source listing 10000...
Traceback (most recent call last):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 588, in _ListUrlRootFunc
    out_file)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 804, in _BatchSort
    current_chunk = sorted(islice(in_iter, buffer_size))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/commands/rsync.py", line 655, in _FieldedListingIterator
    for blr in iterator:
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 490, in IterObjects
    expand_top_level_buckets=True):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 215, in __iter__
    provider=self.wildcard_url.scheme, fields=listing_fields):
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/gcs_json_api.py", line 608, in ListObjects
    global_params=global_params)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/gslib/third_party/storage_apitools/storage_v1_client.py", line 1223, in List
    config, request, global_params=global_params)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
    self.__ProcessHttpResponse(method_config, http_response, request))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 620, in __ProcessHttpResponse
    return self.__client.DeserializeMessage(response_type, content)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/base_api.py", line 446, in DeserializeMessage
    message = encoding.JsonToMessage(response_type, data)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 123, in JsonToMessage
    return _ProtoJsonApiTools.Get().decode_message(message_type, message)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 309, in decode_message
    message_type, result)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 213, in decode_message
    message = self.__decode_dictionary(message_type, dictionary)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 286, in __decode_dictionary
    for item in value]
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 331, in decode_field
    field.message_type, json.dumps(value))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 309, in decode_message
    message_type, result)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 213, in decode_message
    message = self.__decode_dictionary(message_type, dictionary)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 293, in __decode_dictionary
    setattr(message, field.name, self.decode_field(field, value))
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/encoding_helper.py", line 325, in decode_field
    result = decoder(field, value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/py/extra_types.py", line 261, in _DecodeDateTimeField
    message_types.DateTimeField(1), value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/protojson.py", line 329, in decode_field
    return util.decode_datetime(value)
  File "/home/Administrador/google-cloud-sdk/platform/gsutil/third_party/apitools/apitools/base/protorpclite/util.py", line 267, in decode_datetime
    decoded_datetime = datetime.datetime.strptime(time_string, format_string)
  File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
    _regex_cache[format] = format_regex
error: release unlocked lock
Caught non-retryable exception while listing gs://my-bucket-xxxxxxxx/BACKUP STORAGE/Documentos/SEC_PLANEJAMENTO/: release unlocked lock
mfschwartz commented 6 years ago

Thanks for the continued digging. So, we know this problem is unrelated to multiprocessing support.

Your comment about encoding problems is interesting. It's possible that the Windows implementation of some library is unable to handle the file encoding while the corresponding Cygwin library is able to. However, another possibility is that you have files with broken encodings, and if it weren't for those files being present your gsutil rsync operation would work on Cygwin. Would it be possible for you to make a copy of your source tree and remove all the files where Windows is complaining about the encoding, and then try running the rsync atop Cygwin, over those files? If there are a huge number of files another possibility would be to copy just a subset of your directory to a temp directory, making sure to include at least 1 file with an encoding Windows didn't like; verify that running gsutil rsync over Cygwin in that directory exhibits the failure; and then remove that file with the encoding problem and try again.

Thanks.

almirb commented 6 years ago

Result of tests using -n switch (dry run) and another folder (70k files).

1- Without any special char (~ ^ ç), using -m switch:

2- With a big subfolder renamed with special character (çã), using -m switch:

I guess the Cygwin error is not related with encoding....

P.S.: I needed to edit my post because Cygwin worked for the first time I tried. When I tried again (retesting), I got the release unlocked lock error (with and without special characters on folder).

mfschwartz commented 6 years ago

Hmm, that makes me think something else: I wonder if there's some filesystem state causing the problem -- the first time you try it the rsync works but leaves one or more files in a bad state (e.g., the file is locked), and then second time you run it the file in that bad state triggers the thread lock problem.

Can you add debugging output so gsutil tells you the name of the file it's processing at the time the failure occurs? I'm thinking if you know that you could try another experiment: After the failure happens, try copying that file to a temp file in that same directory, then mv it back over the original file. If you then run the gsutil rsync command again and the failure doesn't happen, that would suggest the file got left in a broken state, and we could focus on figuring out what/how that happened.

almirb commented 6 years ago

@mfschwartz ,

I made the tests and sent the output to your e-mail, as you suggested (the files contain sensitive info).

Thanks!

almirb commented 6 years ago

I'm thinking about trying to run the gsutil inside a docker container...

mfschwartz commented 6 years ago

Hi Almir,

I tried to recreate the case you experienced by copying the object names from your bucket listing to my own bucket, but I haven't yet been able to reproduce the problem. Instead, while trying to create those objects (using gsutil cp), I get errors with the object names that contain wildcard characters (like '[').

If you copy all the objects from this bucket to another bucket but exclude the ones with these wildcard characters, and then you run the gsutil rsync command, does the released locks problem happen?

It's a known limitation that gsutil doesn't work well with objects/files whose names contain wildcard characters.

Also, I'm not sure what running gsutil inside a docker container would have to do with the current problem. Can you say more why you mention that?

Thanks,

Mike

almirb commented 6 years ago

I ended up creating a virtual machine with Hyper-V: Ubuntu Server 18.04. I installed Google Cloud Sdk, and did the rest of config. I also mounted the machine backup folder locally with cifs. Now I'm sending the same files without any problem, with encoded characters and with full speed (multithreading). =D

I think the most of people use gsutil with linux... And the "release unlocked lock" could be cygwin specific related.

Thank you @mfschwartz for your effort helping me with this issue, but now I'll stick with the Ubuntu VM.

mfschwartz commented 6 years ago

Ah, I see, yes - gsutil works better on Linux in general. Glad you found a workaround.