OneDrive / onedrive-api-docs

Official documentation for the OneDrive API
MIT License
451 stars 227 forks source link

ODP | ODB: Live photos (.heic files) from the iPhone have a different size in listings compared to when downloaded (API CAUSES DATA LOSS) #1723

Open abraunegg opened 1 year ago

abraunegg commented 1 year ago

Category

Expected or Desired Behavior

Observed Behavior

When the OneDrive API is queried for .heic files, the API presents one value for the file type (correct size) but then delivers a file significantly smaller.

Further details can also be referenced here: https://github.com/OneDrive/onedrive-api-docs/issues/1532 . This issue was over zealously closed by @k-tsoi citing that it is 'completed' .. however this data loss bug has never been fixed by Microsoft.

Detailed 'onedrive' application logs from the OneDrive Linux client are below that illustrates the OneDrive API problem.

Steps to Reproduce

  1. Take photo on iOS device, ensuring a Live Preview is stored in the image
  2. Upload the resulting .heic file via the iOS device to OneDrive via the iOS OneDrive Application
  3. Download the files to Linux using the OneDrive API via the 'onedrive' application (https://github.com/abraunegg/onedrive)

Additionally refer to https://github.com/OneDrive/onedrive-api-docs/issues/1532

abraunegg commented 1 year ago

@ificator This is a long standing bug, originally opened on Sep 16, 2021.

Can this be looked at with some priority please?

iOS .heic files should be downloaded in their original form, as reported by file size, and not truncated so that the file that is downloaded looses all of the 'live' data .. which then really constitutes client side data loss.

abraunegg commented 1 year ago

@ificator Some log analysis from what the OneDrive API is sending. There is data loss occurring.

Evidence: When the application is receiving the JSON data about a file to download, the application receives the following:

{
    "@odata.type": "#microsoft.graph.driveItem",
    "cTag": "aYzpEMTNBM0VEMzRDQzZENDgyITIxMTA0LjI1OA",
    "eTag": "aRDEzQTNFRDM0Q0M2RDQ4MiEyMTEwNC4y",
    "file": {
        "hashes": {
            "quickXorHash": "9r/UaOPdoW68czpEsknBOlP3xAI=",
            "sha1Hash": "5056DEEC39DF0AE1DE67875282359F3A79C74057",
            "sha256Hash": "FC39014A2A3BF1E3CD823BAE18CE757C2F501F4877B7C4807348F622EACD9762"
        },
        "mimeType": "image/heic"
    },
    "fileSystemInfo": {
        "createdDateTime": "2022-05-31T13:56:53.22Z",
        "lastModifiedDateTime": "2022-05-31T13:56:53.22Z"
    },
    "id": "D13A3ED34CC6D482!21104",
    "name": "XXXX.heic",
    "parentReference": {
        "driveId": "redacted",
        "driveType": "personal",
        "id": "redacted",
        "name": "05",
        "path": "redacted"
    },
    "size": 3039814
}

The reported size online is 3039814 bytes.

When the application is processing this JSON we get the following:

[DEBUG] Local Disk Space Actual: 631629778944
[DEBUG] Free Space Reservation:  52428800
[DEBUG] File Size to Download:   3039814
[DEBUG] Setting file permissions for: redacted/path/to/file/XXXX.heic
[DEBUG] File size on disk:          682474
[DEBUG] OneDrive API reported size: 3039814
ERROR: File download size mis-match. Increase logging verbosity to determine why.
[DEBUG] Actual file hash:           FC39014A2A3BF1E3CD823BAE18CE757C2F501F4877B7C4807348F622EACD9762
[DEBUG] OneDrive API reported hash: 9r/UaOPdoW68czpEsknBOlP3xAI=
ERROR: File download hash mis-match. Increase logging verbosity to determine why.

The file size as downloaded by the application via the API is 682474 bytes - a dramatic difference - thus this is why the trigger for file size & hash mis-match is being hit.

Now one could say where is the evidence that the application is not at fault here. Without looking deeper into the application by using --debug-https the application debug logs wont show what is being delivered by the OneDrive API at the HTTP Transport Layer.

However, in the debug log there are some .heic files that are >4Mb in size, and, when doing a session download, the application writes out all of the chunked bytes that the application is receiving. When the log is analysed for these types of files we get the following JSON:

{
    "@odata.type": "#microsoft.graph.driveItem",
    "cTag": "aYzpEMTNBM0VEMzRDQzZENDgyITIxMTQ4LjI1OA",
    "eTag": "aRDEzQTNFRDM0Q0M2RDQ4MiEyMTE0OC4y",
    "file": {
        "hashes": {
            "quickXorHash": "TrNg2ZqtJPADBVEt7WhzVta4GlA=",
            "sha1Hash": "F97A129F915F23C5E23A2258260B2EA96CFD0C55",
            "sha256Hash": "22E0226F71C7C3D375030C61586350284AA7B3B9E31DD1F0B6A966D34F14C53A"
        },
        "mimeType": "image/heic"
    },
    "fileSystemInfo": {
        "createdDateTime": "2022-06-01T05:56:29.67Z",
        "lastModifiedDateTime": "2022-06-01T05:56:29.67Z"
    },
    "id": "D13A3ED34CC6D482!21148",
    "name": "YYYYYYY.heic",
    "parentReference": {
        "driveId": "redacted",
        "driveType": "personal",
        "id": "redacted",
        "name": "06",
        "path": "redacted"
    },
    "size": 6887427
}

When this is processed:

Downloading file Pictures/Camera Roll/2022/06/20220601_043439324_iOS.heic ... 
[DEBUG] Local Disk Space Actual: 631628136448
[DEBUG] Free Space Reservation:  52428800
[DEBUG] File Size to Download:   6887427

Downloading   0% |                                        |   ETA   --:--:--:
[DEBUG] Data Received    = 50697
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 1

[DEBUG] Data Received    = 50697
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 1

We can see here that the application is expecting, based on the JSON, to download a file size of 6887427, however when the OneDrive API session is initiated, the data from the OneDrive API changes 6887427 to 4583414

When the session download chunks have completed, we get the expected size difference detected application response:

[DEBUG] Data Received    = 4583414
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 100
[DEBUG] Incrementing Progress Bar using fmod match

Downloading  95% |oooooooooooooooooooooooooooooooooooooo  |   ETA   00:00:02 
[DEBUG] Data Received    = 4583414
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 100

[DEBUG] Data Received    = 4583414
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 100

[DEBUG] Data Received    = 4583414
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 100

[DEBUG] Setting file permissions for: redacted/path/to/file/YYYYYYY.heic
[DEBUG] File size on disk:          4583414
[DEBUG] OneDrive API reported size: 6887427
ERROR: File download size mis-match. Increase logging verbosity to determine why.
[DEBUG] Actual file hash:           22E0226F71C7C3D375030C61586350284AA7B3B9E31DD1F0B6A966D34F14C53A
[DEBUG] OneDrive API reported hash: TrNg2ZqtJPADBVEt7WhzVta4GlA=
ERROR: File download hash mis-match. Increase logging verbosity to determine why.
INFO: Potentially add --disable-download-validation to work around this issue but downloaded data integrity cannot be guaranteed.
[DEBUG] Download or creation of local directory failed
[DEBUG] ------------------------------------------------------------------

The OneDrive API is modifying the .heic files, and, despite the JSON advising what the size online is, is sending via the HTTP Transport a smaller file - that has got the live-preview stripped out .. which essentially here is data loss being caused by the OneDrive API.

This needs to be fixed and is a OneDrive API bug, and causes serious DATA LOSS for .heic files.

abraunegg commented 1 year ago

@ificator Any opportunity to look at this? This appears to be a data loss scenario.

andrewdolphin commented 11 months ago

Just to add - this bug exists even when using the onedrive web page and when syncing using the windows file explorer/onedrive app.

What that means is that, unless I'm missing something, it is currently impossible to retrieve a live photo backed up from an ios device in it's full format. Whilst the full file appears to be stored on onedrive, there is no way to in fact access it!

abraunegg commented 9 months ago

@ificator , @microsoft

Please can this issue be looked at? It causes data loss when using the OneDrive API

thomascobb commented 9 months ago

I am affected by this bug too. Interestingly the full file must be in OneDrive somewhere, as the iOS app shows the video portion of the .heic file when you long press the photo. The web client is also meant to have a button to show the live photo but this appears to be broken for me, at least in firefox and chrome: image

with web console output:

Uncaught SyntaxError: Unable to process binding "if: function(){return showBurstPhotoBadge }"
Message: Unable to parse bindings.
Bindings value: class:burstPhotoButtonClass,attr:{id:burstPhotoBadgeId,'aria-label':burstPhotoBadgeAriaLabel,title:burstPhotoBadgeAriaLabel},click:onPlayBurstPhotoVideoClicked,event:{mouseover:onPlayBurstPhotoVideoMouseover},style:{'background-image':'url(\\''+burstPhotoBadgeImageUrl+'\\')'}
Message: missing } after property list
    parseBindingsString https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
    getBindingAccessors https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
    getBindingAccessors https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/plt.odsp-common.js:1
    f https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
    Qc https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
    Rc https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
    aa https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js:2
[odclightspeed.js:3:390](https://res-1.cdn.office.net/files/odsp-web-prod_2023-12-01.019/odclightspeedwebpack.manifest/odclightspeed.js)note: { opened at line 3, column 362
calcium90 commented 8 months ago

@thomascobb Yes you seem to be correct, I cannot access the live photo from OneDrive in a web browser either. But it is accessible in the iOS OneDrive app by pressing and holding on the photo, which triggers a short download dialog followed by the live photo being played.

So it seems as though this bug has arisen from changes made to accommodate the mobile app to download the actual photo and the live photo separately, despite residing in the same file - unfortunately as a result the API is serving a modified and incomplete file.

As @abraunegg says this represents unexpected data loss to the client and should be investigated ASAP.

szofar commented 8 months ago

"unfortunately as a result the API is serving a modified and incomplete file"—YIKES.

That is completely unacceptable in the following situation:

  1. I upload all my files to OneDrive
  2. I delete the files from all my devices (since they are safe in OneDrive)
  3. I download the files to a non-iOS device (with no visible error for the messed up .heic files)
  4. I delete the files from OneDrive because I no longer want to use OneDrive for backing up my files (theoretically)
  5. I attempt to open the .heic files years later and find them corrupted. In fact, I find thousands of these corrupt files including some very precious photos.
  6. I google for answers and find this ticket...

I think it would be fair to consider Microsoft liable in this situation for damaging the data, and I'm amazed this bug does not have higher priority.

per-oestergaard commented 8 months ago

(Cross-posting https://github.com/abraunegg/onedrive/discussions/2589#discussion-6062151)

Info: I have a support case open with Microsoft to see if the HEIC problem can be found/solved. So far my conclusion is -

/Per

denzel-farmer commented 8 months ago

Has there been any response to this bug? For clients storing iOS live photos, this seems like it breaks the fundamental usecase for OneDrive -- to faithfully store and provide access to client files.

If I understand correctly, there is no way for me to retrieve unmodified live photo data uploaded via the OneDrive IOS app, except for viewing them in that first-party app? Not only that, but the bizarre size response from the API causes issues for widely-used utilities like rclone even if the user doesn't care about live photo data?

Would love either a resolution to this or at least some clarification about whether or not this is expected behavior.

lucyc166 commented 8 months ago

Also running into this issue. It seems like live photo support and the file size reporting are really two separate bugs. Even if the iOS app can't properly back up live photos (although they seem accessible in the web app, so this might not be the root problem), it shouldn't seemingly back them up, then report a different size than will actually be downloaded. I think this is causing the issue with rclone; it uses size as a method for detecting file changes, so it ends up re-syncing every live photo every time a clone is performed.

per-oestergaard commented 8 months ago

Also running into this issue. It seems like live photo support and the file size reporting are really two separate bugs. Even if the iOS app can't properly back up live photos (although they seem accessible in the web app, so this might not be the root problem), it shouldn't seemingly back them up, then report a different size than will actually be downloaded. I think this is causing the issue with rclone; it uses size as a method for detecting file changes, so it ends up re-syncing every live photo every time a clone is performed.

My tests say that backup works fine. The bug is related to live photos stored as HEIC files. There is just no way of getting the live photos back out of the system. The live view feature in the web interfaces seems to render them by getting an application/dash+xml manifest kind of file and then get video as video/mp4 and audio as audio/mp4. Dash is probably this ISO standard https://webconcepts.info/specs/ISO/IEC/23009-1 - details behind paywall.

I could not find any details of the HEIC file. But in some way that also does not matter as I would like OneDrive to provide me with a way of getting my files out again in the original format.

The OneDrive API GET request uses some undocumented querystring to do this trick in the web interface. An example: GET https://api.onedrive.com/v1.0/drives/<redacted>/items/<redacted>!<redacted>/content?format=dash&pretranscode=0&transcodeahead=0&part=index&ccat=2&psi=<redacted>&prefer=Migration%3DEnableRedirect%3BFailOnMigratedFiles&<redacted>

I'll ask for an update on my support case.

PetrVys commented 1 month ago

I've managed to download the full live photo from OneDrive. maybe it'll help...

A live photo consists of two files, one .heic (or .jpg) and one .mov. There is no issue getting the image part (heic/jpg).

To get the video part, you need to use API endpoint for personal OneDrive (SharePoint/corporate OneDrive does not support Live Photos to the best of my knowledge; and the graph endpoint does not seem to support live photos either)

First step is to identify if a photo is a live photo at all. During POST to get folder data (POST to https://api.onedrive.com/v1.0/drives/*****/items...) include "select=photo" to get photo metadata. The returned object will include empty livePhoto object in case it is a live photo:

"photo": {
                "cameraMake": "Apple",
                "cameraModel": "iPhone 15 Pro",
                "exposureDenominator": 900.0,
                "exposureNumerator": 1.0,
                "focalLength": 6.764999866,
                "fNumber": 1.779999971,
                "iso": 80,
                "livePhoto": {},
                "orientation": 1,
                "takenDateTime": "***"
            },

Next step is to get content with undocumented parameter format=video (found by a dumb luck...):

POST https://api.onedrive.com/v1.0/drives/##DRIVEID##/items/##DRIVEID##!##ITEMID##/content?format=video&ump=1

This will even return correct filename (e.g. the same as the image, but with .mov extension) in reponse headers:

Content-Disposition: attachment; filename="YYYYMMDD_HHMMSSSSS_iOS.mov"

The returned data is exactly the same as the .mov file copied from iPhone locally

abraunegg commented 1 month ago

@PetrVys

The returned data is exactly the same as the .mov file copied from iPhone locally

Great find, however this means then tracking now 2 files for the 1 file, and if both are downloaded, this technically still constitutes data loss as the original .heic file has now been split into 2 files.

The action then with the 'onedrive' client - the client will now upload the smaller .heic file (the jpg .. thus data loss) and a new separate file which is the movie component.

@Microsoft Please fix the OneDrive API to stop causing data loss.

PetrVys commented 1 month ago

@abraunegg : No, a live photo is composed of two individual files (heic/jpg and mov) that are linked via exif information, see e.g. https://www.whexy.com/dyn/ec968903-2fab-44ac-8003-62d14cacc2f5 (or many others, this was a first search hit). So by downloading the heic/jpg and mov file, you do have a complete live photo. You get the same pair when you connect iphone to a computer directly and download the live photos via cable.

What you're talking about is a thing, but it's called Google Motion Photo (and obviously lives in the land of Android). It works just fine with OneDrive, because it is a single file containing both still image and video.

abraunegg commented 1 month ago

@PetrVys

No, a live photo is composed of two individual files (heic/jpg and mov) that are linked via exif information, see e.g. https://www.whexy.com/dyn/ec968903-2fab-44ac-8003-62d14cacc2f5 (or many others, this was a first search hit).

Unfortunately I do not agree with statement .. let me explain

A .heic file is a single file, like a container, and within that single file are the JPG and MOV file elements.

You can download sample .heic files from here: https://filesamples.com/formats/heic

Now .. from an iPhone, when you upload the .heic file, it has the complete full size.

When the Microsoft Graph API presents JSON data about the .heic file, it provides the following (as an example):

{
    "@odata.type": "#microsoft.graph.driveItem",
    "cTag": "aYzpEMTNBM0VEMzRDQzZENDgyITIxMTA0LjI1OA",
    "eTag": "aRDEzQTNFRDM0Q0M2RDQ4MiEyMTEwNC4y",
    "file": {
        "hashes": {
            "quickXorHash": "9r/UaOPdoW68czpEsknBOlP3xAI=",
            "sha1Hash": "5056DEEC39DF0AE1DE67875282359F3A79C74057",
            "sha256Hash": "FC39014A2A3BF1E3CD823BAE18CE757C2F501F4877B7C4807348F622EACD9762"
        },
        "mimeType": "image/heic"
    },
    "fileSystemInfo": {
        "createdDateTime": "2022-05-31T13:56:53.22Z",
        "lastModifiedDateTime": "2022-05-31T13:56:53.22Z"
    },
    "id": "D13A3ED34CC6D482!21104",
    "name": "XXXX.heic",
    "parentReference": {
        "driveId": "redacted",
        "driveType": "personal",
        "id": "redacted",
        "name": "05",
        "path": "redacted"
    },
    "size": 3039814
}

note the 'size' as reported by the API

However when the API sends the file data, the API changes the file size to just that of the JPG file:

[DEBUG] Local Disk Space Actual: 631629778944
[DEBUG] Free Space Reservation:  52428800
[DEBUG] File Size to Download:   3039814
[DEBUG] Setting file permissions for: redacted/path/to/file/XXXX.heic
[DEBUG] File size on disk:          682474
[DEBUG] OneDrive API reported size: 3039814
ERROR: File download size mis-match. Increase logging verbosity to determine why.
[DEBUG] Actual file hash:           FC39014A2A3BF1E3CD823BAE18CE757C2F501F4877B7C4807348F622EACD9762
[DEBUG] OneDrive API reported hash: 9r/UaOPdoW68czpEsknBOlP3xAI=
ERROR: File download hash mis-match. Increase logging verbosity to determine why.

Looking deeper into the HTTP stream, this can further be evidenced that the API is changing the file upon sending the file to the client, for example this JSON:

{
    "@odata.type": "#microsoft.graph.driveItem",
    "cTag": "aYzpEMTNBM0VEMzRDQzZENDgyITIxMTQ4LjI1OA",
    "eTag": "aRDEzQTNFRDM0Q0M2RDQ4MiEyMTE0OC4y",
    "file": {
        "hashes": {
            "quickXorHash": "TrNg2ZqtJPADBVEt7WhzVta4GlA=",
            "sha1Hash": "F97A129F915F23C5E23A2258260B2EA96CFD0C55",
            "sha256Hash": "22E0226F71C7C3D375030C61586350284AA7B3B9E31DD1F0B6A966D34F14C53A"
        },
        "mimeType": "image/heic"
    },
    "fileSystemInfo": {
        "createdDateTime": "2022-06-01T05:56:29.67Z",
        "lastModifiedDateTime": "2022-06-01T05:56:29.67Z"
    },
    "id": "D13A3ED34CC6D482!21148",
    "name": "YYYYYYY.heic",
    "parentReference": {
        "driveId": "redacted",
        "driveType": "personal",
        "id": "redacted",
        "name": "06",
        "path": "redacted"
    },
    "size": 6887427
}

provides the following deeper level inspection:

Downloading file Pictures/Camera Roll/2022/06/20220601_043439324_iOS.heic ... 
[DEBUG] Local Disk Space Actual: 631628136448
[DEBUG] Free Space Reservation:  52428800
[DEBUG] File Size to Download:   6887427

Downloading   0% |                                        |   ETA   --:--:--:
[DEBUG] Data Received    = 50697
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 1

[DEBUG] Data Received    = 50697
[DEBUG] Expected Total   = 4583414
[DEBUG] Percent Complete = 1

We can see here that the application is expecting, based on the JSON, to download a file size of 6887427, however when the OneDrive API session is initiated, the data from the OneDrive API changes 6887427 to 4583414

So by downloading the heic/jpg and mov file, you do have a complete live photo.

You have the elements of the .heic file - yes, but the .heic file itself is now the smaller size - which is now just the JPG element .. which constitutes data loss.

PetrVys commented 1 month ago

HEIC format is essentially a h265 I-frame behind the curtains (not JPG!), but it is still picture without video. If you download the picture you referenced above, it is not a live photo and does not contain movement. HEIC does not support motion at all (unless we're talking about Google Motion Photo HEICs, but those can be identified by name MVIMG_.heic). Just take any iPhone, shoot a single live photo and connect it to a windows/linux computer (macs hide the two files in many places, similarly to how OneDrive hides it). You'll see two files of the same name, .heic and *.mov, and both are needed for the live photo to work.

Microsoft just decided to hide the second file into a single OneDrive item, thus in some places you see the size as a sum of the two individual files and in other places you see it as just the static picture with it's smaller size. From your dump above, you basically have two separate files within the single identifier D13A3ED34CC6D482!21148.

I'm not sure how are the hashes computed, as we're deep into undocumented territory... But most likely it'll be somehow glued together and the hash computed from this concatenated file.

On a side note - I'm actually interested in this for the purpose of converting Live Photos into Google Motion Picture heics, since having both parts in a single file is much preferred for many reasons, unless you have everything in Apple ecosystem.

abraunegg commented 1 month ago

@PetrVys

Microsoft just decided to hide the second file into a single OneDrive item, thus in some places you see the size as a sum of the two individual files and in other places you see it as just the static picture with it's smaller size. From your dump above, you basically have two separate files within the single identifier D13A3ED34CC6D482!21148.

Regardless of this - as a user, if I store via Microsoft Graph API a file of X size, (.heic format or other) I have a reasonable expectation that I should be able to retrieve that exact same file with that exact same size.

For all other file types this occurs (yes, Microsoft adds Metadata to some files post upload when using Business Accounts which also breaks data integrity) - but for the most part a user can retrieve the file uploaded without issue.

For .heic files - you always get a much smaller file than what you uploaded as internal content has been stripped - that is - the file that was uploaded, is now not the file that was download = data loss.

PetrVys commented 1 month ago

True, but it's not internal content, it's external content you're missing - the additional *.MOV file containing Live Photo's video. It never resides in the image file, but OneDrive stores it within the same ID and needs a dark magic to download.

The approach chosen is IMO quite confusing - especially if you want to archive the full live photo for some reason. But at the same time I can kind of understand why it was taken - Live Photos are a mess and people would complain that they have a separate video for every photo they've taken. So hiding it in order to not inconvenience average user was chosen.