brave / brave-browser

Brave browser for Android, iOS, Linux, macOS, Windows.
https://brave.com
Mozilla Public License 2.0
17.57k stars 2.28k forks source link

IPTC data in images #5238

Open Alex-CodeLab opened 5 years ago

Alex-CodeLab commented 5 years ago

Description

[feature request] Facebook, Instagram and others add IPTC Metadata in images to track user behaviour.

Since Brave-browser aims to prevent tracking, it would be a nice privacy enhancing feature if there was an option to automatically remove this data on download ("Save Image as...")

Some info about FB IPTC: https://stackoverflow.com/questions/31120222/iptc-metadata-automatically-added-to-uploaded-images-on-facebook

fmarier commented 5 years ago

Apparently Twitter strips it out and tags the photo with "FB".

Removal instructions: exiftool -all= image.jpg

fmarier commented 5 years ago

jpegoptim --strip-iptc image.jpg should also work while preserving things like EXIF, XMP and ICC.

fmarier commented 5 years ago

If we want to parse JPEG files and alter their metadata without introducing an extra C++ parser into the codebase, we should look at a Rust library like https://github.com/kamadak/exif-rs.

jdkuki commented 5 years ago

Found this issue interesting and took a brief look into it. @fmarier it looks like exif-rs is deficient for this because it only looks for the APP1 marker rather than the APPn format and specifically the APP13 marker where IPTC is stored. Was able to implement a basic parser that ignores all APPn meta data tags in about ~16 lines of cpp, not sure an external dependency is needed.

lukemulks commented 10 months ago

Realizing that this is quite an old issue, but found it after I came to open an issue for a feature request to introduce an option remove exif data similar to https://exiftool.org/.

Featuring this in the browser would be a marketable advantage imo, as some of the more popular services people use for removing this data require the user to upload their images to the server for the service, relying on the service to delete the images every 24hrs as they claim to do (without clear proof beyond policy statements).

IMO, it would complement the clean link sharing and other features we ship very well.

Neat to see that the team had already been researching. Not sure where this would land in priority scale of things, but seems like an area where there's a heavy amount of personal data processed that's not really well addressed with feature support among competitors.

In the time passed since the issue was opened here, more attention and research has been devoted to the topic:

https://er.educause.edu/articles/2021/6/privacy-implications-of-exif-data Screenshot_20231110_085103_Brave

https://www.comparitech.com/blog/vpn-privacy/exif-metadata-privacy/

if there are some cases where retaining some metadata is defensible, perhaps the lowest hanging data to remove would be location data, where detected.

It might be worth considering with the emergence of a broad array of generative AI image and art tooling. Not totally sure how good of a job they're doing around removing the data from uploads, or if that data is possibly cross purposed in some way, but it may be worth thinking about since that is becoming another wild west sector. Cc: @pes10k

pes10k commented 10 months ago

I think this is a fantastic idea (I believe @bsclifton even prototyped it at one point). I honestly dont' remember why it didn't get merged (whether other priorities overtook it, or whether someone objected).

But i think this is a great idea. Even more, there is some very neat research out of University of Chicago on how to slightly alter images to evade face recognition systems [1] that I think we could also include in this kind of feature. I remember pushing for it a while back, but i think the previous chief scientist was opposed (IIRC, i might be misremembering).

Eitherway, i think either or both of these would be terrific features

1: https://www.usenix.org/conference/usenixsecurity20/presentation/shan