element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
11.21k stars 2k forks source link

Options to strip EXIF metadata from media uploads. #4426

Open ara4n opened 7 years ago

ara4n commented 7 years ago

A really useful privacy feature would be to be able to strip off any metadata from images (or audio or video) before uploading them to the server. This could be done at the server itself, but then the server would be able to sniff it and it wouldn't work with E2E uploads which are probably the most important ones anyway.

So instead we should probably implement it as an option for the client to mangle the image before uploading to strip it out.

ara4n commented 7 years ago

(Given we already have the ability to mangle file-uploads before sending them to the server, when we do E2E encryption on them clientside before sending, this isn't impossible, although could be quite fiddly to get right and not pose performance problems).

ara4n commented 7 years ago

http://jsfiddle.net/mowglisanu/frhwm2xe/3/ has a relatively plausible looking example of stripping exif metadata from JPEGs. The right place to insert this is around https://github.com/matrix-org/matrix-react-sdk/blob/443ab1add73390176478fc7ecd1a334aa157e833/src/ContentMessages.js#L293 - either replacing the file with a new file object pointing to the updated data stream, or changing all the downstream code to work on data streams rather than file objects.

ara4n commented 7 years ago

actually, https://github.com/matrix-org/matrix-react-sdk/blob/443ab1add73390176478fc7ecd1a334aa157e833/src/ContentMessages.js#L237 is another option: just before we do the upload, we could check the mime type here for image/jpeg etc and then load the file into RAM and mangle it before sending (and then pass the same buffer to the e2e code as needed).

ara4n commented 7 years ago

another complication is that if we strip off EXIF colour profile metadata we should be re-compressing the image to 'bake in' the correct profile first. the right solution here is probably to leave colour profile meta intact.

turt2live commented 7 years ago

One concern with loading it into RAM would be issues like https://github.com/vector-im/riot-web/issues/4264

ara4n commented 7 years ago

For some reason i couldn't get this out of my head, so i've pushed a totally untested and almost certainly broken proof-of-concept to https://github.com/matrix-org/matrix-react-sdk/commit/a0eea2a2713a6f32f40ad0cf8f6d56b404a60f62. If someone felt like picking it up and testing/finishing it it'd be hugely appreciated, as I should be focusing on organisational stuff atm rather than writing code, sadly :(

ara4n commented 7 years ago

(see also https://github.com/matrix-org/matrix-doc/issues/558)

ara4n commented 5 years ago

https://github.com/matrix-org/matrix-react-sdk/pull/1307 ended up being the PR for this, but we couldn't get it to work, and it bitrotted and got closed :( It should still be resurrectable by some kind soul in future though.

hpdeifel commented 5 years ago

Just some 2ct: I want to use matrix as a kind of shitty dropbox for sharing files including photos with perfectly crafted exif metadata. I wouldn't want matrix to mangle the files in any way, they should be bit for bit identical when I download them again.

:arrow_right: so IMHO stripping exif should be configurable, although probably enabled by default.

ara4n commented 4 years ago

after a brief 3 year hiatus, https://github.com/matrix-org/matrix-react-sdk/pull/1307 now implements this.

ptman commented 4 years ago

Slack is now offering something similar: https://yro.slashdot.org/story/20/05/11/2051209/slack-now-strips-location-data-from-images . Does it make any sense at all to also allow this in synapse, so that every client wouldn't have to implement it separately? There are toolkits designed for this: https://0xacab.org/jvoisin/mat2

t3chguy commented 4 years ago

This is to prevent even your synapse knowing the metadata

ptman commented 4 years ago

@t3chguy I get that it's better to remove it on client than server. But as with many things in matrix, there can be different levels of doing stuff (not only e2e, but unencrypted rooms with ssl encryption between servers) and I think it could help many clients if there was a synapse fallback for clients that don't support removing metadata from all different media formats.

t3chguy commented 4 years ago

Oh and of course the fact that if you use synapse to do it for in an encrypted room then you'll leak that media.

ptman commented 4 years ago

Sure, but aren't also url previews disabled for encrypted rooms. I think public/unencrypted rooms may actually benefit more from having the publicly available media scrubbed of metadata

aaronraimist commented 4 years ago

Huh? This is for media uploads, not URL previews.

ptman commented 4 years ago

it was a comparison

ghost commented 3 years ago

I thought Elemet was supposed to be private and secure, but not removing metadata goes completely against that.

RealDolos commented 3 years ago

I thought Elemet was supposed to be private and secure, but not removing metadata goes completely against that.

Dear fellow user, private and secure is not the same as anonymous 😉. If I used element e.g. to send photos to family abroad, I probably wouldn't mind meta data. They already know who I am anyway and the meta data may help them organize their photos. All we want in this case is to have a private and secure channel to communicate.

My general user expectation would be that element does not "mess" with the content of the files I send, unless I specifically tell it to do so. Your expectation differs, which is fine. Compromises, like an on/off toggle for such a feature, are possible.

Now if anything gets implemented, then it has to be clear how to enable/disable that, and when it is active. Something like a "strip metadata" checkbox on the upload dialog thing (or a warning that the file type is not supported for stripping), and maybe a global setting to specify if stripping is enabled by default or not.

Regardless, stripping won't ever be perfect, because there are many many file types in the wild that can carry meta data and you'd need a custom stripper for each format, and moreover what is meta data at all and what is sensitive meta data in that regard is also not well specified and may differ for different people. E.g. is the camera model that took a pic too sensitive? How about the embedded ICC color profile that is vital for the correct displaying of the image, as that sometimes can be fingerprinted to specific camera makes and models and sometimes even specific cameras?

Anyway, a good place to start I'd expect would be with

t3chguy commented 3 years ago

https://github.com/matrix-org/matrix-react-sdk/pull/1307 is a partial solution for JPEGs, partial because it also strips some EXIF metadata which is kinda important, like orientation and colour space.

olmari commented 3 years ago

I do have to agree with @RealDolos here, matrix (unbrella) shouldn't mangle files at all, and even if something like that is ever done, it needs to be very clear and opt-in.

Salamandar commented 3 years ago

I do have to agree with @RealDolos here, matrix (unbrella) shouldn't mangle files at all, and even if something like that is ever done, it needs to be very clear and opt-in.

100% agree. There's already a check box "send the real size image" on Android, meaning that when you forget to check it it's NOT the real file. A check box "Remove image metadatas" would be a real good addition to the UI.