Open robertschulze opened 1 month ago
@robertschulze
HTML encoding requires to add filter entries twice
This is not a bug.
For whatever reason - you are the only person who seems to have paths stored online that use HTML entities - I am yet to even find a way locally or remotely that creates folders that contain things like '%20'. How are these being even created in the first place or is this some sort of 'legacy' naming issue ?
The folder entity that is returned from Microsoft OneDrive includes the HTML entity itself in the JSON response - thus, to filter against that specific folder you need to add the folder name that contains that entity.
Your solution here is to really rename your folders .. failing that it is double entries .. sorry.
Very strange, I don't think I did anything special to create these. Maybe the HTML entities are an artifact of me "mounting" other OneDrives within my main OneDrive?
Very strange, I don't think I did anything special to create these. Maybe the HTML entities are an artifact of me "mounting" other OneDrives within my main OneDrive?
No, that does not create that effect, neither can you create a folder or link with %20
- as when I tested this last time you brought this up - OneDrive prevents the creation... so this has to be something with how you are doing things online.
You are probably right: even if I create a folder "BücherTest" within my main OneDrive (not a linked one) in the main root dir the special letter is HTML encoded even in the browser:
Maybe it is due to the fact that I am using OneDrive in German?
For reference I also tested on a corporate SharePoint and also here folders with special characters are HTML encoded:
For pure English locale this is not the case? Or maybe I am checking in the wrong place?
@robertschulze What should be happening is that UTF-16 encoding should be used - for example:
drwx------. 1 alex alex 0 Feb 8 06:57 α
drwx------. 1 alex alex 0 Feb 8 06:57 เอกสาร
These are stored online correctly as UTF-16, for example:
You also have some sort of DecodePath item also being 'included' .. which is not occurring for any item I am using:
So I created a new folder online called Bücher
using the text from this issue ticket, and when I look at the verbose debug logging (I have to use a test personal account here to demonstrate as I am doing long running memory tests using Business|Sharepoint)
DEBUG: Processing OneDrive Item 2 of 2 from API Response Bundle 1
DEBUG: Raw JSON OneDrive Item: {"cTag":"adDo2NkQ1M0JFOEE1MDU2RUNBITQzMjE1OC42Mzg1NDE4MjMwMjc1MDAwMDA","eTag":"aNjZENTNCRThBNTA1NkVDQSE0MzIxNTguMg","fileSystemInfo":{"createdDateTime":"2024-06-17T00:40:15.18Z","lastModifiedDateTime":"2024-06-17T00:40:15.18Z"},"folder":{"childCount":0,"view":{"sortBy":"takenOrCreatedDateTime","sortOrder":"ascending","viewType":"thumbnails"}},"id":"66D53BE8A5056ECA!432158","name":"Bücher","parentReference":{"driveId":"66d53be8a5056eca","driveType":"personal","id":"66D53BE8A5056ECA!101","name":"root:","path":"\/drive\/root:"},"size":0}
The 'name' as provided by Microsoft OneDrive API includes the umlaut ... not some HTML entity version of the text. The JSON is processed 100% normally:
DEBUG: ------------------------------------------------------------------
DEBUG: Processing OneDrive JSON item 1 of 1 as part of JSON Item Batch 1 of 1
DEBUG: Raw JSON OneDrive Item (Batched Item): {"cTag":"adDo2NkQ1M0JFOEE1MDU2RUNBITQzMjE1OC42Mzg1NDE4MjMwMjc1MDAwMDA","eTag":"aNjZENTNCRThBNTA1NkVDQSE0MzIxNTguMg","fileSystemInfo":{"createdDateTime":"2024-06-17T00:40:15.18Z","lastModifiedDateTime":"2024-06-17T00:40:15.18Z"},"folder":{"childCount":0,"view":{"sortBy":"takenOrCreatedDateTime","sortOrder":"ascending","viewType":"thumbnails"}},"id":"66D53BE8A5056ECA!432158","name":"Bücher","parentReference":{"driveId":"66d53be8a5056eca","driveType":"personal","id":"66D53BE8A5056ECA!101","name":"root:","path":"\/drive\/root:"},"size":0}
DEBUG: Attempting to calculate local filesystem path for 66d53be8a5056eca and 66D53BE8A5056ECA!101
DEBUG: JSON Item calculated full path is: ./Bücher
DEBUG: The item we are syncing is a folder
DEBUG: Flagging object as a directory
DEBUG: OneDrive change is potentially a new local item
DEBUG: Creating local directory: ./Bücher
DEBUG: Requested path does not exist, creating directory structure: ./Bücher
DEBUG: Setting directory permissions for: ./Bücher
DEBUG: Setting directory lastModifiedDateTime for: ./Bücher to 2024-Jun-17 00:40:15.18Z
DEBUG: Calling setTimes() for this directory: ./Bücher
DEBUG: saveItem - creating DB item from this JSON: {"cTag":"adDo2NkQ1M0JFOEE1MDU2RUNBITQzMjE1OC42Mzg1NDE4MjMwMjc1MDAwMDA","eTag":"aNjZENTNCRThBNTA1NkVDQSE0MzIxNTguMg","fileSystemInfo":{"createdDateTime":"2024-06-17T00:40:15.18Z","lastModifiedDateTime":"2024-06-17T00:40:15.18Z"},"folder":{"childCount":0,"view":{"sortBy":"takenOrCreatedDateTime","sortOrder":"ascending","viewType":"thumbnails"}},"id":"66D53BE8A5056ECA!432158","name":"Bücher","parentReference":{"driveId":"66d53be8a5056eca","driveType":"personal","id":"66D53BE8A5056ECA!101","name":"root:","path":"\/drive\/root:"},"size":0}
DEBUG: Flagging object as a directory
DEBUG: Batched JSON item processing time: 1 ms, 768 μs, and 8 hnsecs
DEBUG: ------------------------------------------------------------------
So I think that there is 100% something odd with your OneDrive account that I literally cannot assist with.
At this point I would be raising a case with Microsoft to understand why this is being used and how this is being triggered:
In the Elements view in the browser it is indeed correctly encoded also for my OneDrives:
So I guess here the browser has already applied some logic to convert from URL encoding to UTF-16.
However, originally I am seeing the encoded entries in the children JSON in the Network view in the browser.
E.g. here, to check, I just created a new free OneDrive (i.e. outside of my paid MS365 accounts) and created a folder Bücher:
Does it look the same for you in the Network view?
If not, then it must be something about how MS handles the locale of the browser, because I see no other difference.
@robertschulze
Using one of my 'test' free Personal OneDrive accounts, from the OneDrive 'root' the folder Bücher
shows the following:
Subfolders within Bücher
show that the Parent 'name' without any encoding. This is the correct response. The 'path' and 'pathFromRoot' is URL encoded - as this is expected to be used with other queries:
Everything is being presented correctly for the 'name' JSON element, which the 'onedrive' client uses for all operations.
So right now, I suspect that what you are seeing is the result one of the following:
What I have in 'My Profile' for this test account is the following:
What is yours set to? If I change my account to match yours - do I get the same experience? If you change yours to match mine does this change your experience?
In my view this is potentially a OneDrive API Bug as the 'name' element in the API responses should never be URL encoded. The OneDrive API does not specify at all that the 'name' element may contain URL encoded characters - thus this is why I think this is an API bug.
That no-one else however has reported this (only yourself and this started off with %20
characters in folder names) ... so I cannot rule out something to do with how you have configured something at your end which is a contributing item that triggers this.
@abraunegg I did a number of tests. Disabling all plugins in Chrome did not help, changing my country to Australia also did not help (language was already English (United States)).
Now, switching to Firefox, actually I see the same as "Bücher" as well
but checking the actual response (collapsing the formatted JSON / scrolling all the way down) I see that it actually is "B\u00fccher"
so apparently - as opposed to Chrome - Firefox appears to perform some internal conversion.
Can you maybe check on your side whether this is also the case for you?
If so, could it be that the OneDrive browser GUI is using a different API than your onedrive?
@robertschulze As i have explained and demonstrated before - nothing is being mangled or presenting as being URL Encoded anywhere
@robertschulze
If so, could it be that the OneDrive browser GUI is using a different API than your onedrive?
They both use the same Microsoft Graph API
@ificator Do you have any idea what the contributing factor's are here? I am at a loss as to what is going on here.
Any input you have would be greatly appreciated.
@abraunegg You are right, i was mistaking B\u00fccher for URL encoding of Bücher, but it is just UTF. But then I think the response from the API is actually fine:
{"@odata.context":"https://graph.microsoft.com/v1.0/$metadata#drives('e7247016c7826fbc')/items('E7247016C7826FBC%21103')/children","@odata.count":8,"value":[........,{"id":"E7247016C7826FBC!2406","name":"B\u00fccher","eTag":"aRTcyNDcwMTZDNzgyNkZCQyEyNDA2LjA","cTag":"adDpFNzI0NzAxNkM3ODI2RkJDITI0MDYuNjM4NTM5ODU4NjM1NTAwMDAw","size":2816411537,"folder":{"childCount":25,"view":{"viewType":"thumbnails","sortBy":"name","sortOrder":"ascending"}},"fileSystemInfo":{"createdDateTime":"2024-04-24T18:58:17.283Z","lastModifiedDateTime":"2024-04-24T18:58:13Z"},"parentReference":{"driveId":"e7247016c7826fbc","driveType":"personal","id":"E7247016C7826FBC!103","name":"Dokumente","path":"/drives/e7247016c7826fbc/items/E7247016C7826FBC!103:"}},.........]}
With sync_list:
/OneDrive - Elias/*
actually nothing is synced because items within "OneDrive - Elias" are considered root items (which I think is due to the fact that they are linked even though they are linked from folder subfolder Documents within the other OneDrive):
DEBUG: Adding 8 OneDrive items for processing from the OneDrive 'root' folder
[...]
DEBUG: sync_list item to check: Bücher
DEBUG: Evaluation against 'sync_list' for this path: Bücher
DEBUG: [S]exclude = false
DEBUG: [S]exludeDirectMatch = false
DEBUG: [S]excludeMatched = false
DEBUG: Evaluation against 'sync_list' entry: /OneDrive - Elias/*
DEBUG: [F]exclude = false
DEBUG: [F]exludeDirectMatch = false
DEBUG: [F]excludeMatched = false
DEBUG: Evaluation against 'sync_list' final result: EXCLUDED
DEBUG: Skipping item - excluded by sync_list config: Bücher
With sync_list:
/OneDrive - Elias/*
/Bücher/*
the folder Bücher is matched per se:
DEBUG: Adding 8 OneDrive items for processing from the OneDrive 'root' folder
[...]
DEBUG: Evaluation against 'sync_list' for this path: Bücher
DEBUG: [S]exclude = false
DEBUG: [S]exludeDirectMatch = false
DEBUG: [S]excludeMatched = false
DEBUG: Evaluation against 'sync_list' entry: /OneDrive - Elias/*
DEBUG: Evaluation against 'sync_list' entry: /Bücher/*
DEBUG: Exact path match with 'sync_list' entry
DEBUG: Evaluation against 'sync_list' result: direct match
DEBUG: [F]exclude = false
DEBUG: [F]exludeDirectMatch = false
DEBUG: [F]excludeMatched = false
DEBUG: Evaluation against 'sync_list' final result: included for sync
Then the list of files within the folder is retrieved:
{"@odata.context":"https://graph.microsoft.com/v1.0/$metadata#drives('e7247016c7826fbc')/items('E7247016C7826FBC%212406')/children","@odata.count":25,"value":[.............,{"id":"E7247016C7826FBC!1492","name":"20221008 - Pythagoras 5.pdf","eTag":"aRTcyNDcwMTZDNzgyNkZCQyExNDkyLjQ","cTag":"aYzpFNzI0NzAxNkM3ODI2RkJDITE0OTIuMjU3","size":186489673,"file":{"mimeType":"application/pdf","hashes":{"quickXorHash":"8sxKNhJpXoxSqeXNUYyGXFTKfbg=","sha1Hash":"B801A340E3D338DEFA1E101C0E28260184372702","sha256Hash":"1F6A3DFAE430BB380AAF71B642B7FD3899361988930F22CBFD3DA639265B25CF"}},"fileSystemInfo":{"createdDateTime":"2022-12-18T13:29:19.28Z","lastModifiedDateTime":"2022-10-08T09:03:09Z"},"parentReference":{"driveId":"e7247016c7826fbc","driveType":"personal","id":"E7247016C7826FBC!2406","name":"B\u00fccher","path":"/drives/e7247016c7826fbc/items/E7247016C7826FBC!103:/B%C3%BCcher"}},.............]}
where the name of the parentReference is fine (B\u00fccher). However, the path of the parentReference (/drives/e7247016c7826fbc/items/E7247016C7826FBC!103:/B%C3%BCcher) contains URL encoded letters.
This is why the file is not being synced
DEBUG: skip_file item to check (file name only - parent path not in database): /20221008 - Pythagoras 5.pdf
DEBUG: skip_file evaluation for: /20221008 - Pythagoras 5.pdf
DEBUG: skip_file evaluation for: /20221008 - Pythagoras 5.pdf
DEBUG: Result: false
DEBUG: CAUTION: The JSON element transmitted by the Microsoft OneDrive API includes HTML URL encoded items, which may complicate pattern matching and potentially lead to synchronisation problems for this item.
DEBUG: WORKAROUND: An alternative solution could be to change the name of this item through the online platform: /B%C3%BCcher/20221008 - Pythagoras 5.pdf
DEBUG: See: https://github.com/OneDrive/onedrive-api-docs/issues/1765 for further details
DEBUG: sync_list item to check: B%C3%BCcher/20221008 - Pythagoras 5.pdf
DEBUG: Evaluation against 'sync_list' for this path: B%C3%BCcher/20221008 - Pythagoras 5.pdf
DEBUG: [S]exclude = false
DEBUG: [S]exludeDirectMatch = false
DEBUG: [S]excludeMatched = false
DEBUG: Evaluation against 'sync_list' entry: /OneDrive - Elias/*
DEBUG: Evaluation against 'sync_list' entry: /Bücher/*
DEBUG: [F]exclude = false
DEBUG: [F]exludeDirectMatch = false
DEBUG: [F]excludeMatched = false
DEBUG: Evaluation against 'sync_list' final result: EXCLUDED
DEBUG: Skipping item - excluded by sync_list config: B%C3%BCcher/20221008 - Pythagoras 5.pdf
(except if I would add B%C3%BCcher to the sync_list)
At the same time, your screenshot above also shows the path of the parentReference containing URL encoded letters:
So I would expect the same issue to appear also in your case, is it not?
What I meant w.r.t. to Firefox: it appears to convert UTF, i.e. if you look at the RAW json response you see B\u00fccher while in the "formatted" JSON view you see Bücher. Chrome shows B\u00fccher also in the formatted JSON view. In any case, as you correctly state, no mangling of URL encodes, this is just a question of how they handle UTF.
@robertschulze The 'onedrive' code does not use the 'path' element
The whole JSON response item you need to reference is 'name' - this is the only item which is being HTML encoded in your responses. This is why:
@robertschulze
You are right, i was mistaking B\u00fccher for URL encoding of Bücher, but it is just UTF. But then I think the response from the API is actually fine
Not correct ... event the UTF encoding is wrong .. in ALL of my examples the 'name' JSON element contains the correct text - unmangled by either UTF encoding or HTML encoding.
The only thing I can think of is you have configured something somewhere for Microsoft to auto-translate or auto do something.
For my reference, could you please share the full RAW json of your "Bücher" example (from the onedrive client with --debug-https or in Firefox, selecting children on the left and on the right side scrolling all the way down to the actual json string)?
Details have been sent offline.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
@robertschulze I am unable to communicate with you via email due to some change with your setup. .. messages cannot be delivered.
@robertschulze Your inbound email rules are still broken
Describe the bug
In my sync_list I have (among others):
To sync subdirs "Backup" and "Bücher" of folder "OneDrive - Elias". However, directory "Bücher" is not entered, rather the seek continues in "Dokumente"
while for the parallel directory "Backup", the subdirectory "Backup/20221026 - Kindle" is considered
Further down in the log file it turns out the subdir "20181006 - Fredo 1 Mathematik Arbeitsheft" was excluded
Only when now adding "B%C3%BCcher" to the sync_list as well, the files and folders within "Bücher" are properly synchronized:
Operating System Details
Client Installation Method
From Source
OneDrive Account Type
Personal
What is your OneDrive Application Version
onedrive v2.5.0-rc2-37-g3f7fb5a
What is your OneDrive Application Configuration
What is your 'curl' version
Where is your 'sync_dir' located
Network
What are all your system 'mount points'
What are all your local file system partition types
How do you use 'onedrive'
./onedrive --confdir='/home/robert/.config/onedrive/accounts/robert@guitaronline.de' --sync --verbose --verbose --resync --resync-auth > debug_output.log 2>&1
Steps to reproduce the behaviour
see description above
Complete Verbose Log Output
Screenshots
No response
Other Log Information or Details
No response
Additional context
No response