abraunegg / onedrive

OneDrive Client for Linux
https://abraunegg.github.io
GNU General Public License v3.0
9.91k stars 857 forks source link

Bug: Initial sync crashes with std.utf.UTFException: Invalid UTF-8 sequence #2829

Open phlibi opened 1 day ago

phlibi commented 1 day ago

Describe the bug

This has been mentioned in https://github.com/abraunegg/onedrive/issues/2813 already and might be related. It happened at the end of an initial sync of a Sharepoint folder. Re-running the exact same process (also with --resync --resync-auth) then completed normally.

Operating System Details

Debian Bookworm (12) with backports enabled
Linux phiptp 6.10.6+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.6-1~bpo12+1 (2024-08-26) x86_64 GNU/Linux

Client Installation Method

From Source

OneDrive Account Type

SharePoint

What is your OneDrive Application Version

v2.5.0-6-g280d369 (with PR 2816)

What is your OneDrive Application Configuration

$ ./onedrive --sync --confdir=/home/phip/.config/onedrive/swxxxod --verbose --download-only --resync --resync-auth --display-config
Reading configuration file: /home/phip/.config/onedrive/swxxxod/config
Configuration file successfully loaded
Using 'user' configuration path for application config and state data: /home/phip/.config/onedrive/swxxxod
Application version                          = onedrive v2.5.0-6-g280d369
Compiled with                                = DMD 2109
User Application Config path                 = /home/phip/.config/onedrive/swxxxod
System Application Config path               = /etc/onedrive
Applicable Application 'config' location     = /home/phip/.config/onedrive/swxxxod/config
Configuration file found in config location  = true - using 'config' file values to override application defaults
Applicable 'sync_list' location              = /home/phip/.config/onedrive/swxxxod/sync_list
Applicable 'items.sqlite3' location          = /home/phip/.config/onedrive/swxxxod/items.sqlite3
Config option 'drive_id'                     = b!ozVsZqWFpU.........b5nb-SaXtsp
Config option 'sync_dir'                     = ~/phipsfiles/swxxx/swxxxod
Config option 'enable_logging'               = false
Config option 'log_dir'                      = /var/log/onedrive
Config option 'disable_notifications'        = false
Config option 'skip_dir'                     = 
Config option 'skip_dir_strict_match'        = false
Config option 'skip_file'                    = ~*|.~*|*.tmp|*.swp|*.partial
Config option 'skip_dotfiles'                = false
Config option 'skip_symlinks'                = true
Config option 'monitor_interval'             = 300
Config option 'monitor_log_frequency'        = 12
Config option 'monitor_fullscan_frequency'   = 12
Config option 'read_only_auth_scope'         = false
Config option 'dry_run'                      = false
Config option 'upload_only'                  = false
Config option 'download_only'                = true
Config option 'local_first'                  = false
Config option 'check_nosync'                 = false
Config option 'check_nomount'                = false
Config option 'resync'                       = true
Config option 'resync_auth'                  = true
Config option 'cleanup_local_files'          = false
Config option 'classify_as_big_delete'       = 1000
Config option 'disable_upload_validation'    = false
Config option 'disable_download_validation'  = false
Config option 'bypass_data_preservation'     = false
Config option 'no_remote_delete'             = false
Config option 'remove_source_files'          = false
Config option 'sync_dir_permissions'         = 700
Config option 'sync_file_permissions'        = 600
Config option 'space_reservation'            = 52428800
Config option 'application_id'               = d50ca740-c83f-4d1b-b616-12c519384f0c
Config option 'azure_ad_endpoint'            = 
Config option 'azure_tenant_id'              = 
Config option 'user_agent'                   = ISV|abraunegg|OneDrive Client for Linux/v2.5.0-6-g280d369
Config option 'force_http_11'                = false
Config option 'debug_https'                  = false
Config option 'rate_limit'                   = 0
Config option 'operation_timeout'            = 3600
Config option 'dns_timeout'                  = 60
Config option 'connect_timeout'              = 10
Config option 'data_timeout'                 = 60
Config option 'ip_protocol_version'          = 0
Config option 'threads'                      = 8
Compile time option --enable-notifications   = false

Selective sync 'sync_list' configured        = false

Config option 'sync_business_shared_items'   = false

Config option 'webhook_enabled'              = false

What is your 'curl' version

curl 8.9.1 (x86_64-pc-linux-gnu) libcurl/8.9.1 GnuTLS/3.7.9 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 libssh2/1.10.0 nghttp2/1.52.0 ngtcp2/1.6.0 nghttp3/1.4.0 librtmp/2.3 OpenLDAP/2.5.13
Release-Date: 2024-07-31, security patched: 8.9.1-2~bpo12+1
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

Where is your 'sync_dir' located

Local

What are all your system 'mount points'

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=15715820k,nr_inodes=3928955,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3152504k,mode=755,inode64)
zroot/ROOT/debian on / type zfs (rw,relatime,xattr,noacl,casesensitive)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11726)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
none on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
zroot on /zroot type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home on /home type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/scratch on /scratch type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles on /home/phip/phipsfiles type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/developing on /home/phip/phipsfiles/developing type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/documents on /home/phip/phipsfiles/documents type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/swxxx on /home/phip/phipsfiles/swxxx type zfs (rw,relatime,xattr,noacl,casesensitive)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)

What are all your local file system partition types

All local data is stored on ZFS

How do you use 'onedrive'

The folder is a company share, about 10 other people have access. Changes are rare, though. It is rather unlikely that a file was changed online during the run.

Steps to reproduce the behaviour

Although not currently tried, I could possibly remove all local files and state to trigger the same failure again. I can do this if requested, but since all this might be closely related to #2813, this will probably not provide much more insight.

Complete Verbose Log Output

NOTE: Stripped log, as all this is already being handled by abraunegg.

$ ./onedrive --sync --confdir=/home/phip/.config/onedrive/swxxxod --verbose --download-only --resync --resync-auth
...
Processing: heliumv/bestellungen/2020
The directory has not changed
Attempting to perform a database vacuum to optimise database
Database vacuum is complete
std.utf.UTFException@/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d(1556): Invalid UTF-8 sequence (at index 1)
----------------
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d:1594 pure dchar std.utf.decodeImpl!(true, 0, const(char)[]).decodeImpl(ref const(char)[], ref ulong) [0x5627578b3090]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d:1186 pure @trusted dchar std.utf.decode!(0, const(char)[]).decode(scope ref const(char)[], ref ulong) [0x5627578b3003]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/ir.d:827 pure @safe bool std.regex.internal.ir.Input!(char).Input.nextChar(ref dchar, ref ulong) [0x562757888cb2]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/thompson.d:789 pure @trusted bool std.regex.internal.thompson.ThompsonMatcher!(char, std.regex.internal.ir.Input!(char).Input).ThompsonMatcher.next() [0x56275788ccc4]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/thompson.d:943 pure @trusted int std.regex.internal.thompson.ThompsonMatcher!(char, std.regex.internal.ir.Input!(char).Input).ThompsonMatcher.match(std.regex.internal.ir.Group!(ulong).Group[]) [0x5627578910d1]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:775 pure void std.regex.RegexMatch!(immutable(char)[]).RegexMatch.__ctor!(std.regex.internal.ir.Regex!(char).Regex).__ctor(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex).__lambda4!(std.regex.internal.ir.Group!(ulong).Group[]).__lambda4(std.regex.internal.ir.Group!(ulong).Group[]) [0x5627578a7050]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/ir.d:1122 pure void std.regex.internal.ir.SmallFixedArray!(std.regex.internal.ir.Group!(ulong).Group, 3u).SmallFixedArray.mutate(scope void delegate(std.regex.internal.ir.Group!(ulong).Group[]) pure) [0x56275789819a]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:775 ref @trusted std.regex.RegexMatch!(immutable(char)[]).RegexMatch std.regex.RegexMatch!(immutable(char)[]).RegexMatch.__ctor!(std.regex.internal.ir.Regex!(char).Regex).__ctor(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex) [0x5627578a6fb6]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:1013 @safe std.regex.RegexMatch!(immutable(char)[]).RegexMatch std.regex.match!(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex).match(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex) [0x5627578a6e1d]
src/util.d:521 bool util.isValidUTCDateTime(immutable(char)[]) [0x5627578ad9ee]
src/itemdb.d:701 itemdb.Item itemdb.ItemDatabase.buildItem(sqlite.Statement.Result) [0x56275790b920]
src/itemdb.d:505 itemdb.Item[] itemdb.ItemDatabase.selectChildren(const(char)[], const(char)[]) [0x562757909db7]
src/sync.d:3371 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1f4b]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3132 void syncEngine.SyncEngine.performDatabaseConsistencyAndIntegrityCheck() [0x5627578e09c2]
src/main.d:763 _Dmain [0x562757788393]

Screenshots

No response

Other Log Information or Details

No response

Additional context

2816

Client compiled from source with --enable-debug Total synchronized data is about 2.4GB in 5300 files

abraunegg commented 1 day ago

@phlibi I have updated the #2816 PR with a number of changes this morning.

Please can you rebuild your client using this PR, to validate the fix for this issue.