Backblaze / B2_Command_Line_Tool

The command-line tool that gives easy access to all of the capabilities of B2 Cloud Storage
Other
540 stars 123 forks source link

"surrogates not allowed" error when using b2 to sync files with accented characters #215

Closed fortinmike closed 4 months ago

fortinmike commented 8 years ago

I'm trying to sync files with b2 sync from a Synology NAS but all files whose name contains accented characters (such as é, è, à) fail to sync with an error such as this:

'utf-8' codec can't encode character '\udcc3' in position 14: surrogates not allowed

Full error message:

b2_upload(/volume1/Backup/Files/@eaDir/Cégep@SynoResource, Files/@eaDir/Cégep@SynoResource, 1464922452765): UnicodeEncodeError('utf-8', 'Files/@eaDir/C\udcc3\udca9gep@SynoResource', 14, 15, 'surrogates not allowed') 'utf-8' codec can't encode character '\udcc3' in position 14: surrogates not allowed

I tried fiddling with Python encoding environment variables but I've had no luck making this work. I'm starting to realize this might be something that could only be handled in the b2 tool itself.

Am I right, or is there something I can do to make this work on my machine?

ppolewicz commented 8 years ago

Did you try to set it to utf-16?

fortinmike commented 8 years ago

I did not. I just tried it and it seems to work! Thanks for the suggestion.

export PYTHONIOENCODING=utf-16

Could this be automated or made clearer somehow in the b2 tool to prevent user confusion? Maybe giving an indication that an IO encoding must be specified when there is a surrogate error, or something similar?

Also, after changing the encoding to UTF-16, the tool's output does not display the accented characters properly. They seem to be uploaded successfully though:

upload C?gep/Px/images/menu.psd

This results in the following file on B2:

Cégep/Px/images/menu.psd

ppolewicz commented 8 years ago

@fortinmike could you please share a full stack trace? @bwbeach can you help with the terminal output?

bwbeach commented 8 years ago

I'm visiting family for the weekend. I'll take a look on Monday.

ppolewicz commented 8 years ago

@fortinmike does ls show the file names properly in a terminal? Could you try to export LC_ALL and see if it helps with the b2 CLI output?

fortinmike commented 8 years ago

@ppolewicz There is no stack trace printed, just tons of one-line errors like this:

b2_upload(/volume1/Backup/Files/@eaDir/Cégep@SynoResource, Files/@eaDir/Cégep@SynoResource, 1464922452765): UnicodeEncodeError('utf-8', 'Files/@eaDir/C\udcc3\udca9gep@SynoResource', 14, 15, 'surrogates not allowed') 'utf-8' codec can't encode character '\udcc3' in position 14: surrogates not allowed

ls displays the file names properly when I ssh to my NAS.

export LC_ALL=en_US.utf16 results in the following error: setlocale: LC_ALL: cannot change locale

I also tried LC_ALL=C LANG=en_US.utf16 which didn't result in this error but I got other path errors (file names must not contain '\', etc.), so I didn't venture further. I'm not familiar with LC_ALL and locale configuration on Linux.

ppolewicz commented 8 years ago

Please install the version from github and use the --debugLogs flag and share the log, so that we can see the full stacktrace and other diagnostic information.

no-response[bot] commented 2 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate or assist you further.

MarianoRD commented 5 months ago

I'm still seeing this error @ppolewicz would it be possible to reopen it and work on it? Accented names (which should fit in UTF-8) are not being uploaded due to encoding issues.

ppolewicz commented 5 months ago

@MarianoRD please use the --debugLogs flag and share the log, so that we can see the full stacktrace and other diagnostic information. Preferably sync a directory with just one file that fails.

MarianoRD commented 5 months ago

Removed some of the information that was not really relevant to the error but here it's:

2024-04-27 05:32:55,107 328964  140393021916992 __main__    INFO    // ========================================  3.10.0  ======================================== \\
2024-04-27 05:32:55,181 328964  140393021916992 __main__    DEBUG   platform is Linux-6.1.0-20-amd64-x86_64-with-glibc2.36
2024-04-27 05:32:55,181 328964  140393021916992 __main__    DEBUG   Python version is CPython 3.11.5 (main, Sep  7 2023, 12:36:05) [GCC 12.2.0]
2024-04-27 05:32:55,182 328964  140393021916992 __main__    DEBUG   b2sdk version is 1.24.0
2024-04-27 05:32:55,182 328964  140393021916992 __main__    DEBUG   locale is (None, None)
2024-04-27 05:32:55,182 328964  140393021916992 __main__    DEBUG   filesystem encoding is ascii
2024-04-27 05:32:55,182 328964  140393021916992 b2sdk.account_info.sqlite_account_info  DEBUG   calling SqliteAccountInfo._get_user_account_info_path(cls=<class 'b2sdk.account_info.sqlite_account_info.SqliteAccountInfo'>, file_name=None, profile=None)
2024-04-27 05:32:55,182 328964  140393021916992 b2sdk.account_info.sqlite_account_info  DEBUG   SqliteAccountInfo file path to use: /home/mariano/.b2_account_info
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.account_info.upload_url_pool  DEBUG   calling UploadUrlPool()
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.account_info.upload_url_pool  DEBUG   calling UploadUrlPool()
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.account_info.upload_url_pool  DEBUG   calling UploadUrlPool()
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.account_info.upload_url_pool  DEBUG   calling UploadUrlPool()
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.v2.api    DEBUG   calling B2Session(account_info=<b2sdk.account_info.sqlite_account_info.SqliteAccountInfo object at 0x7fafcab0da10>, cache=<b2sdk.cache.AuthInfoCache object at 0x7fafc913d5d0>, api_config=<b2sdk.api_config.B2HttpApiConfig object at 0x7fafc913d850>)
2024-04-27 05:32:55,183 328964  140393021916992 b2sdk.api   DEBUG   calling B2Session(account_info=<b2sdk.account_info.sqlite_account_info.SqliteAccountInfo object at 0x7fafcab0da10>, cache=<b2sdk.cache.AuthInfoCache object at 0x7fafc913d5d0>, api_config=<b2sdk.api_config.B2HttpApiConfig object at 0x7fafc913d850>)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.api   DEBUG   calling FileVersionFactory(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.api   DEBUG   calling DownloadVersionFactory(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.v2.api    DEBUG   calling Services(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, max_upload_workers=10, max_copy_workers=10, max_download_workers=None, save_to_buffer_size=None, check_download_hash=True, max_download_streams_per_file=None)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.api   DEBUG   calling Services(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, max_upload_workers=10, max_copy_workers=10, max_download_workers=None, save_to_buffer_size=None, check_download_hash=True, max_download_streams_per_file=None)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.v2.transfer   DEBUG   calling LazyThreadPool(max_workers=10, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.transfer.inbound.downloader.abstract  DEBUG   calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.utils.thread_pool DEBUG   calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.v2.transfer   DEBUG   calling LazyThreadPool(max_workers=None, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.transfer.inbound.download_manager DEBUG   calling ParallelDownloader(min_part_size=104857600, max_streams=None, kwargs={'min_chunk_size': 8192, 'max_chunk_size': 1048576, 'align_factor': None, 'thread_pool': <b2sdk.v2.transfer.LazyThreadPool object at 0x7fafc912f010>, 'check_hash': True})
2024-04-27 05:32:55,184 328964  140393021916992 b2sdk.transfer.inbound.download_manager DEBUG   calling AbstractDownloader(thread_pool=<b2sdk.v2.transfer.LazyThreadPool object at 0x7fafc912f010>, force_chunk_size=None, min_chunk_size=8192, max_chunk_size=1048576, align_factor=None, check_hash=True, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964  140393021916992 __main__    INFO    starting command [__main__.Sync] with arguments: ['/tmp/staticx-lilGPg/b2', 'sync', '--debugLogs', '/home/nextcloud-data', 'b2://rda-backups/nextcloud/data/']
2024-04-27 05:32:55,185 328964  140393021916992 b2sdk.v2.transfer   DEBUG   calling ThreadPoolMixin.set_thread_pool_size(self=<b2sdk.v2.transfer.UploadManager object at 0x7fafc912fb90>, max_workers=10)
2024-04-27 05:32:55,185 328964  140393021916992 b2sdk.v2.transfer   DEBUG   calling ThreadPoolMixin.set_thread_pool_size(self=<b2sdk.v2.transfer.DownloadManager object at 0x7fafc912d550>, max_workers=10)
2024-04-27 05:32:55,185 328964  140393021916992 b2sdk.api   DEBUG   calling B2Api.check_bucket_name_restrictions(self=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, bucket_name='rda-backups')
2024-04-27 05:32:55,185 328964  140393021916992 b2sdk.v2.api    DEBUG   calling Bucket(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, id_='2d554c138090d9268aab011d', name='rda-backups', type_=None, bucket_info=None, cors_rules=None, lifecycle_rules=None, revision=None, bucket_dict=None, options_set=None, default_server_side_encryption=<EncryptionSetting(EncryptionMode.UNKNOWN, None, None)>, default_retention=BucketRetentionSetting('unknown', None), is_file_lock_enabled=None, replication=None)
2024-04-27 05:32:55,185 328964  140393021916992 b2sdk.api   DEBUG   calling Bucket(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, id_='2d554c138090d9268aab011d', name='rda-backups', type_=None, bucket_info=None, cors_rules=None, lifecycle_rules=None, revision=None, bucket_dict=None, options_set=None, default_server_side_encryption=<EncryptionSetting(EncryptionMode.UNKNOWN, None, None)>, default_retention=BucketRetentionSetting('unknown', None), is_file_lock_enabled=None, replication=None)
2024-04-27 05:32:55,207 328964  140393021916992 b2sdk.transfer.inbound.downloader.abstract  DEBUG   calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,207 328964  140393021916992 b2sdk.utils.thread_pool DEBUG   calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,253 328964  140393021916992 b2sdk.bucket    DEBUG   calling Bucket.ls(self=Bucket<2d554c138090d9268aab011d,rda-backups,None>, folder_to_list='nextcloud/data', latest_only=False, recursive=True, fetch_count=10000, with_wildcard=False)
2024-04-27 05:32:55,255 328964  140393021916992 urllib3.connectionpool  DEBUG   Starting new HTTPS connection (1): api004.backblazeb2.com:443
2024-04-27 05:32:56,320 328964  140393021916992 urllib3.connectionpool  DEBUG   https://api004.backblazeb2.com:443 "POST /b2api/v2/b2_list_file_versions HTTP/1.1" 200 10009434
<... removed for brevity ...>
2024-04-27 05:33:11,961 328964  140393021916992 b2sdk.sync.policy   DEBUG   File Vicky/files/M&M/11111.png: source time 1642550758000, dest time 1642550758000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,961 328964  140393021916992 b2sdk.sync.sync DEBUG   determined that LocalPath('Vicky/files/M&M/CartadeParticipaci\udcc3\udcb3n.psd', 1615660676000, 199176324) is not present on destination
2024-04-27 05:33:11,963 328964  140393021916992 b2sdk.sync.policy   DEBUG   File Vicky/files/M&M/Clima Boda.png: source time 1615827413000, dest time 1615827413000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,967 328964  140392884336320 b2sdk.sync.action   ERROR   an exception occurred in a sync action
Traceback (most recent call last):
  File "b2sdk/sync/action.py", line 53, in run
  File "b2sdk/sync/action.py", line 162, in do_action
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 1018, in concatenate
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 842, in create_file
  File "b2sdk/bucket.py", line 950, in _create_file
  File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 49-50: surrogates not allowed
2024-04-27 05:33:11,967 328964  140393021916992 b2sdk.sync.sync DEBUG   determined that LocalPath('Vicky/files/M&M/Descargas/Declaraci\udcc3\udcb3n Septiembre 2023.pdf', 1696608213000, 151020) is not present on destination
<... removed for brevity ...>
2024-04-27 05:33:11,972 328964  140392884336320 b2sdk.sync.action   ERROR   an exception occurred in a sync action
Traceback (most recent call last):
  File "b2sdk/sync/action.py", line 53, in run
  File "b2sdk/sync/action.py", line 162, in do_action
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 1018, in concatenate
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 842, in create_file
  File "b2sdk/bucket.py", line 950, in _create_file
  File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 50-51: surrogates not allowed
2024-04-27 05:33:11,974 328964  140393021916992 b2sdk.sync.policy   DEBUG   File Vicky/files/M&M/IMPRIMIR/bienvenidos imprimible.png: source time 1624989358000, dest time 1624989358000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964  140393021916992 b2sdk.sync.policy   DEBUG   File Vicky/files/M&M/IMPRIMIR/entrada ceremonia.png: source time 1624989806000, dest time 1624989806000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964  140393021916992 b2sdk.sync.policy   DEBUG   File Vicky/files/M&M/IMPRIMIR/identificadores.pdf: source time 1626194062000, dest time 1626194062000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964  140393021916992 b2sdk.sync.sync DEBUG   determined that LocalPath('Vicky/files/M&M/Ilustraci\udcc3\udcb3n Vicky chalbaud.png', 1616632425000, 28134824) is not present on destination
2024-04-27 05:33:11,979 328964  140392884336320 b2sdk.sync.action   ERROR   an exception occurred in a sync action
Traceback (most recent call last):
  File "b2sdk/sync/action.py", line 53, in run
  File "b2sdk/sync/action.py", line 162, in do_action
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 1018, in concatenate
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 842, in create_file
  File "b2sdk/bucket.py", line 950, in _create_file
  File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 40-41: surrogates not allowed
2024-04-27 05:33:11,995 328964  140392884336320 b2sdk.sync.action   ERROR   an exception occurred in a sync action
Traceback (most recent call last):
  File "b2sdk/sync/action.py", line 53, in run
  File "b2sdk/sync/action.py", line 162, in do_action
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 1018, in concatenate
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 842, in create_file
  File "b2sdk/bucket.py", line 950, in _create_file
  File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 65-66: surrogates not allowed
2024-04-27 05:33:11,995 328964  140393021916992 b2sdk.sync.policy   DEBUG   <redacted>: source time 1615657750000, dest time 1615657750000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,996 328964  140393021916992 b2sdk.sync.sync DEBUG   determined that LocalPath('Vicky/files/M&M/Invitaciones de Boda/Enviadas/CarlosGonz\udcc3\udca1lezMaza.png', 1615658531000, 34570452) is not present on destination
2024-04-27 05:33:11,997 328964  140393021916992 b2sdk.sync.policy   DEBUG   <redacted>: source time 1615658264000, dest time 1615658264000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,998 328964  140393021916992 b2sdk.sync.policy   DEBUG   <redacted>: source time 1615658015000, dest time 1615658015000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,998 328964  140393021916992 b2sdk.sync.policy   DEBUG   <redacted>: source time 1615658600000, dest time 1615658600000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:12,003 328964  140392884336320 b2sdk.sync.action   ERROR   an exception occurred in a sync action
Traceback (most recent call last):
  File "b2sdk/sync/action.py", line 53, in run
  File "b2sdk/sync/action.py", line 162, in do_action
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 1018, in concatenate
  File "logfury/_logfury/trace_call.py", line 86, in wrapper
  File "b2sdk/bucket.py", line 842, in create_file
  File "b2sdk/bucket.py", line 950, in _create_file
  File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 71-72: surrogates not allowed
2024-04-27 05:33:12,002 328964  140393021916992 b2sdk.sync.policy   DEBUG   File <redacted>.png: source time 1615658668000, dest time 1615658668000, diff 0, threshold 0, diff > threshold False
<... removed for brevity ...>
2024-04-27 05:33:12,005 328964  140393021916992 b2sdk.sync.sync DEBUG   determined that LocalPath('Vicky/files/M&M/Invitaciones de Boda/Enviadas/IgnacioRodr\udcc3\udcadguez.png', 1615657388000, 9078499) is not present on destination

As a note, I have updated b2 to latest (3.19.1) and the error is still there, so I think this dump of logs would be valid, let me know if you need an updated one.

mjurbanski-reef commented 4 months ago

@MarianoRD

2024-04-27 05:32:55,182 328964 140393021916992 main DEBUG locale is (None, None) 2024-04-27 05:32:55,182 328964 140393021916992 main DEBUG filesystem encoding is ascii

this looks like environment problem. Can you share how you are running b2 command? is it also on some NAS / in a docker container?

You should have more luck with b2>=4 which will skip files that have problematic filenames.

MarianoRD commented 4 months ago

@mjurbanski-reef I just retried with v4.0.0 and this seems to have fixed the issue.

Just for historical purposes, the command was running on a Linux machine (Debian 12) standalone. Probably the issue was that I connected through a Mac and Debian does not love the Mac locales.

mjurbanski-reef commented 4 months ago

Happy to hear that. From the logs it seemed like your locales weren't set at all, but if it works now then there is no point in further guessing.

To the next person that encounters issue like this, please provide full:

The working assumption is that with correctly configured system b2 sync should synchronize all files properly, and in case of incorrectly configured one - i.e. with mismatching file system encoding and locale - skip problematic filenames. Closing this ticket as done until we see any reoccurrences of this problem.