Closed fortinmike closed 4 months ago
Did you try to set it to utf-16?
I did not. I just tried it and it seems to work! Thanks for the suggestion.
export PYTHONIOENCODING=utf-16
Could this be automated or made clearer somehow in the b2 tool to prevent user confusion? Maybe giving an indication that an IO encoding must be specified when there is a surrogate error, or something similar?
Also, after changing the encoding to UTF-16, the tool's output does not display the accented characters properly. They seem to be uploaded successfully though:
upload C?gep/Px/images/menu.psd
This results in the following file on B2:
Cégep/Px/images/menu.psd
@fortinmike could you please share a full stack trace? @bwbeach can you help with the terminal output?
I'm visiting family for the weekend. I'll take a look on Monday.
@fortinmike does ls
show the file names properly in a terminal? Could you try to export LC_ALL
and see if it helps with the b2 CLI output?
@ppolewicz There is no stack trace printed, just tons of one-line errors like this:
b2_upload(/volume1/Backup/Files/@eaDir/Cégep@SynoResource, Files/@eaDir/Cégep@SynoResource, 1464922452765): UnicodeEncodeError('utf-8', 'Files/@eaDir/C\udcc3\udca9gep@SynoResource', 14, 15, 'surrogates not allowed') 'utf-8' codec can't encode character '\udcc3' in position 14: surrogates not allowed
ls
displays the file names properly when I ssh to my NAS.
export LC_ALL=en_US.utf16
results in the following error: setlocale: LC_ALL: cannot change locale
I also tried LC_ALL=C LANG=en_US.utf16
which didn't result in this error but I got other path errors (file names must not contain '\'
, etc.), so I didn't venture further. I'm not familiar with LC_ALL and locale configuration on Linux.
Please install the version from github and use the --debugLogs flag and share the log, so that we can see the full stacktrace and other diagnostic information.
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate or assist you further.
I'm still seeing this error @ppolewicz would it be possible to reopen it and work on it? Accented names (which should fit in UTF-8) are not being uploaded due to encoding issues.
@MarianoRD please use the --debugLogs flag and share the log, so that we can see the full stacktrace and other diagnostic information. Preferably sync a directory with just one file that fails.
Removed some of the information that was not really relevant to the error but here it's:
2024-04-27 05:32:55,107 328964 140393021916992 __main__ INFO // ======================================== 3.10.0 ======================================== \\
2024-04-27 05:32:55,181 328964 140393021916992 __main__ DEBUG platform is Linux-6.1.0-20-amd64-x86_64-with-glibc2.36
2024-04-27 05:32:55,181 328964 140393021916992 __main__ DEBUG Python version is CPython 3.11.5 (main, Sep 7 2023, 12:36:05) [GCC 12.2.0]
2024-04-27 05:32:55,182 328964 140393021916992 __main__ DEBUG b2sdk version is 1.24.0
2024-04-27 05:32:55,182 328964 140393021916992 __main__ DEBUG locale is (None, None)
2024-04-27 05:32:55,182 328964 140393021916992 __main__ DEBUG filesystem encoding is ascii
2024-04-27 05:32:55,182 328964 140393021916992 b2sdk.account_info.sqlite_account_info DEBUG calling SqliteAccountInfo._get_user_account_info_path(cls=<class 'b2sdk.account_info.sqlite_account_info.SqliteAccountInfo'>, file_name=None, profile=None)
2024-04-27 05:32:55,182 328964 140393021916992 b2sdk.account_info.sqlite_account_info DEBUG SqliteAccountInfo file path to use: /home/mariano/.b2_account_info
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.account_info.upload_url_pool DEBUG calling UploadUrlPool()
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.account_info.upload_url_pool DEBUG calling UploadUrlPool()
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.account_info.upload_url_pool DEBUG calling UploadUrlPool()
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.account_info.upload_url_pool DEBUG calling UploadUrlPool()
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.v2.api DEBUG calling B2Session(account_info=<b2sdk.account_info.sqlite_account_info.SqliteAccountInfo object at 0x7fafcab0da10>, cache=<b2sdk.cache.AuthInfoCache object at 0x7fafc913d5d0>, api_config=<b2sdk.api_config.B2HttpApiConfig object at 0x7fafc913d850>)
2024-04-27 05:32:55,183 328964 140393021916992 b2sdk.api DEBUG calling B2Session(account_info=<b2sdk.account_info.sqlite_account_info.SqliteAccountInfo object at 0x7fafcab0da10>, cache=<b2sdk.cache.AuthInfoCache object at 0x7fafc913d5d0>, api_config=<b2sdk.api_config.B2HttpApiConfig object at 0x7fafc913d850>)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.api DEBUG calling FileVersionFactory(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.api DEBUG calling DownloadVersionFactory(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.v2.api DEBUG calling Services(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, max_upload_workers=10, max_copy_workers=10, max_download_workers=None, save_to_buffer_size=None, check_download_hash=True, max_download_streams_per_file=None)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.api DEBUG calling Services(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, max_upload_workers=10, max_copy_workers=10, max_download_workers=None, save_to_buffer_size=None, check_download_hash=True, max_download_streams_per_file=None)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.v2.transfer DEBUG calling LazyThreadPool(max_workers=10, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.transfer.inbound.downloader.abstract DEBUG calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.utils.thread_pool DEBUG calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.v2.transfer DEBUG calling LazyThreadPool(max_workers=None, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.transfer.inbound.download_manager DEBUG calling ParallelDownloader(min_part_size=104857600, max_streams=None, kwargs={'min_chunk_size': 8192, 'max_chunk_size': 1048576, 'align_factor': None, 'thread_pool': <b2sdk.v2.transfer.LazyThreadPool object at 0x7fafc912f010>, 'check_hash': True})
2024-04-27 05:32:55,184 328964 140393021916992 b2sdk.transfer.inbound.download_manager DEBUG calling AbstractDownloader(thread_pool=<b2sdk.v2.transfer.LazyThreadPool object at 0x7fafc912f010>, force_chunk_size=None, min_chunk_size=8192, max_chunk_size=1048576, align_factor=None, check_hash=True, kwargs=<class 'inspect._empty'>)
2024-04-27 05:32:55,184 328964 140393021916992 __main__ INFO starting command [__main__.Sync] with arguments: ['/tmp/staticx-lilGPg/b2', 'sync', '--debugLogs', '/home/nextcloud-data', 'b2://rda-backups/nextcloud/data/']
2024-04-27 05:32:55,185 328964 140393021916992 b2sdk.v2.transfer DEBUG calling ThreadPoolMixin.set_thread_pool_size(self=<b2sdk.v2.transfer.UploadManager object at 0x7fafc912fb90>, max_workers=10)
2024-04-27 05:32:55,185 328964 140393021916992 b2sdk.v2.transfer DEBUG calling ThreadPoolMixin.set_thread_pool_size(self=<b2sdk.v2.transfer.DownloadManager object at 0x7fafc912d550>, max_workers=10)
2024-04-27 05:32:55,185 328964 140393021916992 b2sdk.api DEBUG calling B2Api.check_bucket_name_restrictions(self=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, bucket_name='rda-backups')
2024-04-27 05:32:55,185 328964 140393021916992 b2sdk.v2.api DEBUG calling Bucket(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, id_='2d554c138090d9268aab011d', name='rda-backups', type_=None, bucket_info=None, cors_rules=None, lifecycle_rules=None, revision=None, bucket_dict=None, options_set=None, default_server_side_encryption=<EncryptionSetting(EncryptionMode.UNKNOWN, None, None)>, default_retention=BucketRetentionSetting('unknown', None), is_file_lock_enabled=None, replication=None)
2024-04-27 05:32:55,185 328964 140393021916992 b2sdk.api DEBUG calling Bucket(api=<b2sdk.v2.api.B2Api object at 0x7fafc913d750>, id_='2d554c138090d9268aab011d', name='rda-backups', type_=None, bucket_info=None, cors_rules=None, lifecycle_rules=None, revision=None, bucket_dict=None, options_set=None, default_server_side_encryption=<EncryptionSetting(EncryptionMode.UNKNOWN, None, None)>, default_retention=BucketRetentionSetting('unknown', None), is_file_lock_enabled=None, replication=None)
2024-04-27 05:32:55,207 328964 140393021916992 b2sdk.transfer.inbound.downloader.abstract DEBUG calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,207 328964 140393021916992 b2sdk.utils.thread_pool DEBUG calling ThreadPoolExecutor(max_workers=10, thread_name_prefix='', initializer=None, initargs=())
2024-04-27 05:32:55,253 328964 140393021916992 b2sdk.bucket DEBUG calling Bucket.ls(self=Bucket<2d554c138090d9268aab011d,rda-backups,None>, folder_to_list='nextcloud/data', latest_only=False, recursive=True, fetch_count=10000, with_wildcard=False)
2024-04-27 05:32:55,255 328964 140393021916992 urllib3.connectionpool DEBUG Starting new HTTPS connection (1): api004.backblazeb2.com:443
2024-04-27 05:32:56,320 328964 140393021916992 urllib3.connectionpool DEBUG https://api004.backblazeb2.com:443 "POST /b2api/v2/b2_list_file_versions HTTP/1.1" 200 10009434
<... removed for brevity ...>
2024-04-27 05:33:11,961 328964 140393021916992 b2sdk.sync.policy DEBUG File Vicky/files/M&M/11111.png: source time 1642550758000, dest time 1642550758000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,961 328964 140393021916992 b2sdk.sync.sync DEBUG determined that LocalPath('Vicky/files/M&M/CartadeParticipaci\udcc3\udcb3n.psd', 1615660676000, 199176324) is not present on destination
2024-04-27 05:33:11,963 328964 140393021916992 b2sdk.sync.policy DEBUG File Vicky/files/M&M/Clima Boda.png: source time 1615827413000, dest time 1615827413000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,967 328964 140392884336320 b2sdk.sync.action ERROR an exception occurred in a sync action
Traceback (most recent call last):
File "b2sdk/sync/action.py", line 53, in run
File "b2sdk/sync/action.py", line 162, in do_action
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 1018, in concatenate
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 842, in create_file
File "b2sdk/bucket.py", line 950, in _create_file
File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 49-50: surrogates not allowed
2024-04-27 05:33:11,967 328964 140393021916992 b2sdk.sync.sync DEBUG determined that LocalPath('Vicky/files/M&M/Descargas/Declaraci\udcc3\udcb3n Septiembre 2023.pdf', 1696608213000, 151020) is not present on destination
<... removed for brevity ...>
2024-04-27 05:33:11,972 328964 140392884336320 b2sdk.sync.action ERROR an exception occurred in a sync action
Traceback (most recent call last):
File "b2sdk/sync/action.py", line 53, in run
File "b2sdk/sync/action.py", line 162, in do_action
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 1018, in concatenate
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 842, in create_file
File "b2sdk/bucket.py", line 950, in _create_file
File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 50-51: surrogates not allowed
2024-04-27 05:33:11,974 328964 140393021916992 b2sdk.sync.policy DEBUG File Vicky/files/M&M/IMPRIMIR/bienvenidos imprimible.png: source time 1624989358000, dest time 1624989358000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964 140393021916992 b2sdk.sync.policy DEBUG File Vicky/files/M&M/IMPRIMIR/entrada ceremonia.png: source time 1624989806000, dest time 1624989806000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964 140393021916992 b2sdk.sync.policy DEBUG File Vicky/files/M&M/IMPRIMIR/identificadores.pdf: source time 1626194062000, dest time 1626194062000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,975 328964 140393021916992 b2sdk.sync.sync DEBUG determined that LocalPath('Vicky/files/M&M/Ilustraci\udcc3\udcb3n Vicky chalbaud.png', 1616632425000, 28134824) is not present on destination
2024-04-27 05:33:11,979 328964 140392884336320 b2sdk.sync.action ERROR an exception occurred in a sync action
Traceback (most recent call last):
File "b2sdk/sync/action.py", line 53, in run
File "b2sdk/sync/action.py", line 162, in do_action
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 1018, in concatenate
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 842, in create_file
File "b2sdk/bucket.py", line 950, in _create_file
File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 40-41: surrogates not allowed
2024-04-27 05:33:11,995 328964 140392884336320 b2sdk.sync.action ERROR an exception occurred in a sync action
Traceback (most recent call last):
File "b2sdk/sync/action.py", line 53, in run
File "b2sdk/sync/action.py", line 162, in do_action
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 1018, in concatenate
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 842, in create_file
File "b2sdk/bucket.py", line 950, in _create_file
File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 65-66: surrogates not allowed
2024-04-27 05:33:11,995 328964 140393021916992 b2sdk.sync.policy DEBUG <redacted>: source time 1615657750000, dest time 1615657750000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,996 328964 140393021916992 b2sdk.sync.sync DEBUG determined that LocalPath('Vicky/files/M&M/Invitaciones de Boda/Enviadas/CarlosGonz\udcc3\udca1lezMaza.png', 1615658531000, 34570452) is not present on destination
2024-04-27 05:33:11,997 328964 140393021916992 b2sdk.sync.policy DEBUG <redacted>: source time 1615658264000, dest time 1615658264000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,998 328964 140393021916992 b2sdk.sync.policy DEBUG <redacted>: source time 1615658015000, dest time 1615658015000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:11,998 328964 140393021916992 b2sdk.sync.policy DEBUG <redacted>: source time 1615658600000, dest time 1615658600000, diff 0, threshold 0, diff > threshold False
2024-04-27 05:33:12,003 328964 140392884336320 b2sdk.sync.action ERROR an exception occurred in a sync action
Traceback (most recent call last):
File "b2sdk/sync/action.py", line 53, in run
File "b2sdk/sync/action.py", line 162, in do_action
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 1018, in concatenate
File "logfury/_logfury/trace_call.py", line 86, in wrapper
File "b2sdk/bucket.py", line 842, in create_file
File "b2sdk/bucket.py", line 950, in _create_file
File "b2sdk/utils/__init__.py", line 231, in validate_b2_file_name
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 71-72: surrogates not allowed
2024-04-27 05:33:12,002 328964 140393021916992 b2sdk.sync.policy DEBUG File <redacted>.png: source time 1615658668000, dest time 1615658668000, diff 0, threshold 0, diff > threshold False
<... removed for brevity ...>
2024-04-27 05:33:12,005 328964 140393021916992 b2sdk.sync.sync DEBUG determined that LocalPath('Vicky/files/M&M/Invitaciones de Boda/Enviadas/IgnacioRodr\udcc3\udcadguez.png', 1615657388000, 9078499) is not present on destination
As a note, I have updated b2
to latest (3.19.1
) and the error is still there, so I think this dump of logs would be valid, let me know if you need an updated one.
@MarianoRD
2024-04-27 05:32:55,182 328964 140393021916992 main DEBUG locale is (None, None) 2024-04-27 05:32:55,182 328964 140393021916992 main DEBUG filesystem encoding is ascii
this looks like environment problem. Can you share how you are running b2 command? is it also on some NAS / in a docker container?
You should have more luck with b2>=4
which will skip files that have problematic filenames.
@mjurbanski-reef I just retried with v4.0.0
and this seems to have fixed the issue.
Just for historical purposes, the command was running on a Linux machine (Debian 12) standalone. Probably the issue was that I connected through a Mac and Debian does not love the Mac locales.
Happy to hear that. From the logs it seemed like your locales weren't set at all, but if it works now then there is no point in further guessing.
To the next person that encounters issue like this, please provide full:
locale
command outputThe working assumption is that with correctly configured system b2 sync
should synchronize all files properly, and in case of incorrectly configured one - i.e. with mismatching file system encoding and locale - skip problematic filenames.
Closing this ticket as done until we see any reoccurrences of this problem.
I'm trying to sync files with
b2 sync
from a Synology NAS but all files whose name contains accented characters (such as é, è, à) fail to sync with an error such as this:'utf-8' codec can't encode character '\udcc3' in position 14: surrogates not allowed
Full error message:
I tried fiddling with Python encoding environment variables but I've had no luck making this work. I'm starting to realize this might be something that could only be handled in the b2 tool itself.
Am I right, or is there something I can do to make this work on my machine?