Open qrtt1 opened 9 years ago
This should be working. Could you post the output of gsutil version -l
and also re-run that gsutil command with gsutil -d
to show the stack trace?
gsutil version: 4.7
checksum: 72839382f796ff3865e757959eed802f (OK)
boto version: 2.30.0
python version: 2.7.4 (default, Apr 18 2013, 00:07:37) [GCC 4.6.2 20111027 (Red Hat 4.6.2-2)]
OS: Linux 3.14.19-17.43.amzn1.x86_64
multiprocessing available: True
using cloud sdk: True
config path: /home/gcp/.config/gcloud/legacy_credentials/chingyichan.tw@gmail.com/.boto
gsutil path: /home/gcp/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
debug message:
***************************** WARNING *****************************
*** You are running gsutil with debug output enabled.
*** Be aware that debug output includes authentication credentials.
*** Make sure to remove the value of the Authorization header for
*** each HTTP request printed to the console prior to posting to
*** a public medium such as a forum post or Stack Overflow.
***************************** WARNING *****************************
gsutil version: 4.7
checksum: 72839382f796ff3865e757959eed802f (OK)
boto version: 2.30.0
python version: 2.7.4 (default, Apr 18 2013, 00:07:37) [GCC 4.6.2 20111027 (Red Hat 4.6.2-2)]
OS: Linux 3.14.19-17.43.amzn1.x86_64
multiprocessing available: True
using cloud sdk: True
config path: /home/gcp/.config/gcloud/legacy_credentials/chingyichan.tw@gmail.com/.boto
gsutil path: /home/gcp/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: False
installed via package manager: False
editable install: False
Command being run: /home/gcp/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=iddsdkvip -d mv gs://foo-videos/最孤單的人 - 廖文強與壞神經樂團 (Official Music Video).mp4 gs://foo-videos/mv.mp4
config_file_list: ['/home/gcp/.config/gcloud/legacy_credentials/chingyichan.tw@gmail.com/.boto']
config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'true'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('default_project_id', 'iddsdkvip')]
DEBUG: Exception stack trace:
Traceback (most recent call last):
File "/home/gcp/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 469, in _RunNamedCommandAndHandleExceptions
debug_level, parallel_operations)
File "/home/gcp/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 263, in RunNamedCommand
return_code = command_inst.RunCommand()
File "/home/gcp/google-cloud-sdk/platform/gsutil/gslib/commands/mv.py", line 149, in RunCommand
self.debug, self.parallel_operations)
File "/home/gcp/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 256, in RunNamedCommand
args = HandleArgCoding(args)
File "/home/gcp/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 77, in HandleArgCoding
decoded = arg.decode(UTF8)
File "/home/gcp/pyenv/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-20: ordinal not in range(128)
Hmm, so I can't reproduce your error. I copied a local file to your exact object name above, then copied it back to my local disk using your same command.
I wonder if the command line you're pasting above is being converted from a different character encoding to valid UTF8 when pasting to your browser. Could you provide the output of the locale
command?
Hello, I try the copy command it work correctly. However, move doesn't do it well
(pyenv)[gcp@deploy ~]$ gsutil mv gs://muzee-vips/最孤單的人\ -\ 廖文強與壞神經樂團\ \(Official\ Music\ Video\).2.mp4 gs://muzee-vips/最孤單的人\ -\ 廖文強與壞神經樂團\ \(Official\ Music\ Video\).23.mp4
Failure: 'ascii' codec can't encode characters in position 16-20: ordinal not in range(128).
(pyenv)[gcp@deploy ~]$ gsutil cp gs://muzee-vips/最孤單的人\ -\ 廖文強與壞神經樂團\ \(Official\ Music\ Video\).2.mp4 gs://muzee-vips/最孤單的人\ -\ 廖文強與壞神經樂團\ \(Official\ Music\ Video\).23.mp4
Copying gs://muzee-vips/最孤單的人 - 廖文強與壞神經樂團 (Official Music Video).2.mp4 [Content-Type=video/mp4]...
I found the mv will invoke the cp. The args will be decode to unicode more than once:
diff --git a/platform/gsutil/gslib/command_runner.py b/platform/gsutil/gslib/command_runner.py
index 5f62b1f..ae7f829 100755
--- a/platform/gsutil/gslib/command_runner.py
+++ b/platform/gsutil/gslib/command_runner.py
@@ -74,7 +74,13 @@ def HandleArgCoding(args):
processing_header = False
for i in range(len(args)):
arg = args[i]
- decoded = arg.decode(UTF8)
+
+ # Don't decode the unicode string twice
+ if not isinstance(arg, unicode):
+ decoded = arg.decode(UTF8)
+ else:
+ decoded = arg
+
if processing_header:
if arg.lower().startswith('x-goog-meta'):
args[i] = decoded
diff --git a/platform/gsutil/gslib/command_run
I add the unicode check and it work :P
Ah, I missed that in the stack trace. Thanks for tracking it down! We'll get a fix out ASAP.
[~]$ gsutil --version
gsutil version: 4.13
Command: gsutil -d -m rsync -r -x (log_to_sync)/ /destination_folder gs://bucket_name/destination_folder
Debug log: http://pastebin.com/VziY121J
Error raised on folder named düsseldorf
Error itself:
Caught non-retryable exception while listing file:///destination_folder: 'ascii' codec can't encode character u'\xfc' in position 56: ordinal not in range(128)
DEBUG: Exception stack trace:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/gslib/__main__.py", line 524, in _RunNamedCommandAndHandleExceptions
debug_level, parallel_operations)
File "/usr/lib/python2.7/dist-packages/gslib/command_runner.py", line 277, in RunNamedCommand
return_code = command_inst.RunCommand()
File "/usr/lib/python2.7/dist-packages/gslib/commands/rsync.py", line 971, in RunCommand
diff_iterator = _DiffIterator(self, src_url, dst_url)
File "/usr/lib/python2.7/dist-packages/gslib/commands/rsync.py", line 674, in __init__
raise CommandException('Caught non-retryable exception - aborting rsync')
CommandException: CommandException: Caught non-retryable exception - aborting rsync
Please tell me if any additional info is needed.
@paskal - I'm unable to repro the problem you reported. I ran the same exact command you did (but using my own bucket) and it succeeded. Can you provide a listing of the objects in the source dir and destination bucket from before you run the gsutil rsync command? If you'd rather not post the list on the public forum please email to me at gs-team@google.com.
Wrote a letter to gs-team@google.com themed gsutil bug #244 - can't rsync non-ascii folder
with additional info, thanks for rapid response.
Also:
[~]# gsutil version -l
gsutil version: 4.13
checksum: PACKAGED_GSUTIL_INSTALLS_DO_NOT_HAVE_CHECKSUMS (!= 141a3e09b42e1b0b6033108aa24c2286)
boto version: 2.38.0
python version: 2.7.3 (default, Feb 27 2014, 19:58:35) [GCC 4.6.3]
OS: Linux 3.8.0-35-generic
multiprocessing available: True
using cloud sdk: False
config path: /root/.boto
gsutil path: /usr/bin/gsutil
compiled crcmod: True
installed via package manager: True
editable install: False
Thanks a lot for help, it's turned out to be unset locale settings:
root@my_server:~# env|grep -E '(LC|LANG)'
LC_ALL=C
LANG=C
LANGUAGE=C
root@my_server:~# /usr/bin/gsutil -m rsync -r /hosted/aaa/images/ gs://bucket_name/hosted/aaa/images/
Building synchronization state...
Caught non-retryable exception while listing file:///hosted/aaa/images/: 'ascii' codec can't encode character u'\xfc' in position 56: ordinal not in range(128)
CommandException: Caught non-retryable exception - aborting rsync
Caught ^C - exiting
root@my_server:~# export LC_ALL=en_US.UTF-8
root@my_server:~# /usr/bin/gsutil -m rsync -r /hosted/aaa/images/ gs://bucket_name/hosted/aaa/images/
Building synchronization state...
Starting synchronization
Copying file:///hosted/aaa/images/athens/file_to_sync.jpg [Content-Type=image/jpeg]...
So, if you're getting codec can't encode character * in position *: ordinal not in range (128)
, check your locale settings, and if they're not set, add export LC_ALL=en_US.UTF-8
to your .bashrc
file.
Another way to do that inside python:
from os import environ
from subprocess import check_call
command = 'gsutil -m rsync {parameters} {folder} {bucket}{folder}'
check_call(command.split(' '), env=dict(environ, LC_ALL="en_US.UTF-8"))
Feel free to close this one.
I'm still seeing this problem which using gsutil cp: I'm feeding gsutil with it's own output with -I
gsutil ls gs://non-ascii-bug|gsutil cp -I gs://ywz-tmp
I'm getting
ValidationError: Field object encountered non-ASCII string 'File 3D_09 - MANOPOLE_09_06.1-PVZ_7317125_PVZ33x123_\xd0\xab20.stl': 'ascii' codec can't decode byte 0xd0 in position 52: ordinal not in range(128)
(you can try the above your self) I'm using gsutil version 4.15 and did
export LC_ALL=en_US.UTF-8
I see this bug is still open - is it suppose to be fixed?