cryptee / web-client

Cryptee's web client source code for all platforms.
https://crypt.ee
Other
444 stars 22 forks source link

[Feature Request] Display photo metadata and make metadata searchable #121

Closed frejaya closed 9 months ago

frejaya commented 3 years ago

Is your feature request related to a problem? Please describe. Although Crypt.ee allows me to set the exact date of an album, and to see the month and year in the scroll bar to the right, I actually can't see the exact date a specific photo was taken anywhere, such as by clicking on the photo itself or opening an info panel of some kind. The date is one part of the metadata that is useful, but also the metadata about the camera that took the photo. I am always wanting to see which camera was used to take the photo so I can compare camera photo quality.

It would also be really great if this metadata was searchable. (The date is already searchable so that isn't an issue). Right now if I want to be able to search by which camera I used, because I know I took a certain picture on one camera vs. another, I would have to tag the photos I took using that camera, because I can't search the metadata.

Describe the solution you'd like It would be really great if any photo file metadata could be visible somewhere and searchable. For example, if the metadata says that the camera used to take the photo was OLYMPUS IMAGING CORP. E-M5, then I would like to be able to search for "Olympus" and find the photo, and when I click on the photo to view it full screen, have the option to open a side panel to see the exact date and time it was taken, camera details, etc.

The metadata is still with the file as far as I know, it doesn't get lost/stripped away when uploaded to Crypt.ee, so it would be great if we can see that data within Crypt.ee either when we open up the big view of the photo or in a side panel of some kind.

Additional context The info panel would fit well off to the right of the photo for example

Screen Shot 2021-04-27 at 3 44 39 PM
johnozbay commented 3 years ago

Hi there! 👋🏻 Thanks a lot for filing this!

There are a few very specific technical reasons why we don't have an info panel for other EXIF metadata like camera make / lens info / GPS meta etc alongside photos. It has to do with both the lack of detailed EXIF metadata standardization among camera makers, and complexities of on-device encryption.

I wouldn't go so far to say it's impossible, but it's near-impossible to do this in a privacy-preserving way without taking a huge performance & usability hit. I'll write down all the reasons why I think it's not possible [yet] here for posterity, so that perhaps if any of these change in the future, we can revisit this idea again.


1) About EXIF

EXIF data from DSLR cameras can be anywhere up to 64kb each. And while there is some standardization / guidelines for camera makers, every year, camera makers add new tags / different meta info, and deviate from the standards. i.e. EXIF is a 26+yr old standard, so it didn't account for things like what happens if your phone's camera has AI and can tag people's faces and add this into the EXIF metadata etc. OR some phones have dual/triple sensor cameras, and use both lenses to take computational photos. So their EXIF data for lens details get all weird and interesting. Consequently, EXIF standards get extended every year and things do improve, but the speed of improvement and reality is still far from perfect. Depending on your camera make/model/device etc, you still find lots of surprises.

Why is all this important?


2) EXIF & Privacy

All these extra pieces of meta-information has to be stored encrypted for each photo individually. Otherwise things like location data from photos could reveal so much about a user / their home address / work, copyright details = real names of users, and some new phones' AI cameras even tag people in photos automatically into EXIF metadata nowadays, so tons of privacy issues. And according to documents leaked by Edward Snowden, the NSA is targeting Exif information under the XKeyscore program.

So due to all this, as a privacy-first service, we can't store EXIF unencrypted. And if we do encrypt EXIF metadata, it would need to be performant enough that this won't slow the app down. More on this below.

Un-encrypted services do all the fancy metadata extraction / search operations on the server-side, since their servers can see the contents of your photos & their metadata, and can selectively store only the relevant parts. Whereas with Cryptee because your data is encrypted and the servers don't know your encryption key, the extraction / search can only be done on your device, so it's incredibly critical and difficult to decide what to encrypt, what to extract, make searchable, and what not to extract, and each of these tiny decisions could have either massive privacy ramifications or performance hits on your experience. (Since all this computation have to take place on your device, we have to account for the fact that you may be using a 5 year old slow-ish Android phone etc)


3) About on-device encryption

Since Cryptee encrypts everything client-side, on your device, all photos' thumbnails & preview sizes are generated in your browser / on your device before getting encrypted & uploaded. (unlike un-encrypted services – they do all this on the server-side, since they can see the contents of your photos, they can scale down/crop your photos on the server)

So when you upload a new photo, Cryptee first reads your original sized (O) photo then, in your browser creates : – a cropped, small square thumbnail for the gallery (SM), (the small square ones in gallery) – a medium sized preview image for the lightbox preview image (M). (the one you screen-shotted for example) Then encrypts SM, M, and the original image (O), and uploads all three.

Our converter & optimizer tries its best to keep the thumbnail file-sizes as low as possible, so that when you scroll through a gallery of 10,000 photos, you don't have to download & decrypt hundreds of megabytes of thumbnails and take a huge performance hit. So the converter tries its best to make the thumbnails as small as 50 - 100kb if possible.

Same for preview images. We try to keep their footprint as small as possible, so even if you're rage-swiping left/right on your phone quickly, you can cruise through photos in the lightbox / previewer quickly without waiting for things to download & decrypt. So we try to keep their file sizes ~350kb or less.

It would be impractical and silly of us to show the originals in the previewer, since your original can be on average ~25mb each, and in order to swipe through 10 photos you'd have to download and decrypt 250mb of photos.

Here comes the problem. Your EXIF metadata exists only in the original sized photo. Not in the optimized / shrunk thumbnail / preview sizes. And it makes sense if you think of it – the preview images are ~300kb, and adding another 64kb EXIF onto that would increase the file size by ~20%. (and in case of gallery thumbnails by +100% if not more) And since we don't load the original size, the EXIF meta isn't available in the lightbox basically.

And since our servers can't see your photos, they can't extract the EXIF metadata either. Leaving us two choices :

a) We could extract the EXIF metadata during photo uploads, encrypt and upload it separately, then decrypt and display it while you're looking at the photos.

The problem with this approach is that at 64kb/photo, if you have 10,000 photos this means encrypting & uploading another 640mb worth of metadata alongside the photos. And if you want to search it all, you'd have to download and decrypt 640mb of metadata, which simply wouldn't work.

b) We could download & decrypt a 40mb original photo when you press a hypothetical "show exif" button, extract its EXIF, and display it on the fly. But this means, to show you a few lines of camera / lens info from 64kb of EXIF, we had to download a 40mb photo. Which is impractical, yet ironically, still might be the best option.

And since there's no clear standardization, (or even a clear documentation online – most EXIF documentation online is crowd-sourced like this) we can't even confidently strip unnecessary pieces of EXIF data, and keep only the necessary pieces. (i.e. we can't even confidently say "oh yeah we strip GPS data so don't worry" – since some manufacturers put duplicate GPS data into another EXIF slot, or use a whole different slot for GPS coordinates etc.

Otherwise, you're spot on, our photos uploader already extracts EXIF date data during uploads to be able to securely sort & search your photos based on date.


So the long term, ultimate solution to all this is either :

a) extract EXIF during upload on your device, encrypt and store it separately, then while viewing your photos, you could press a "show exif" button, and it would display some basic EXIF data.

b) when you press the "show exif" button, we download and decrypt the original, extract & display the EXIF whenever you want to look at EXIF data.


Finally, to summarize all this, and answer some hypothetical questions about the topic directly:

Can we store metadata and display it? Yes, but it will impact performance & storage significantly.

Can we at least store SOME metadata and display it? Like date, camera make/model etc? Yes, but who gets to decide which piece of meta is more important? Let's say you and I need date, camera model, John Doe needs lens information, Jane Doe needs GPS etc... So we shouldn't be the ones picking what matters and what doesn't

Can we make EXIF metadata searchable on Cryptee? No. Because at 64kb/photo, even with 10,000 photos, that would require downloading & decrypting 640mb of metadata to be able to search for it. Let's say a JPG photo is ~5mb per shot. A mid-level 400gb Cryptee account can theoretically fit ~80,000 photos. So that would be 5gb of EXIF data alone that would need to be downloaded & decrypted to enable search.

But tags? What kind of black magic sorcery did you guys do to make tags searchable? Cool thing about tags is that you can give them really short unique IDs. If you choose 10,000 photos, and tag all 10,000 of them, we don't need to copy paste the whole tag info into all of them. We just need to store something like "tag-id-123" on the photos, and that's it. Then all we have to do is, when you type "#paris" into the search box, Cryptee searches the list of your encrypted tags, checks if there's a matching tag id, then find all photos matching that tag id. This way we don't ever need to know your tags, and it's super fast, and takes up little or no space 🎉 But with EXIF, we'd have to store 64kb for each photo separately, since their EXIF meta will be unique for every photo, we can't assign IDs to them like we do with tags


For now, we can add a little info button next to the download button in the screenshot for example. Once pressed, it can show the name & date info (since we already have the name & date) – for all other things like camera make / model / lens / gps etc, we could add a "show more" button, and it could download the original photo, decrypt, extract EXIF and display more info. But I think we'll still need to do a lot of thinking for this, I'm still not convinced this is the best way, and don't want to commit to saying yes to this yet without proper thinking.


Hoping all this technical stuff makes sense, and sorry about this super long message! 😅

Figured I'll write it all out here, and that way I can reference this answer when others ask the same question in the future.

For now I'll keep this thread open and keep it up to date as we work towards adding a little info button, even though it won't necessarily satisfy your feature-request directly per-se.

Best,

John

frejaya commented 3 years ago

Very enlightening, thank you for explaining!

hirako2000 commented 3 years ago

Thanks for the fascinating summary on the challenge there. I think the metadata search can be addressed with multiple ways, a server side search with some data leakage, or on the client with zero leakage.

The client search approach is significantly simpler than doing this server side, and would perform very well.

The data storage increase is unavoidable. But with compression we are talking about far less than the figure you've exposed there. And metadata may be an optional feature, so if a user wants accepts the few % increase in storage then I don't really see why not, granted the user also has hundreds of megabytes synced and stored on the device to facilitate the search.

I'm not discussing the lack of metadata standard, that's a problem on its own.

Metadata is rather important for a full fledged photo library. At a certain point labels alone aren't suitable to reasonably retreive what we are after.

Best.

johnozbay commented 3 years ago

Hi there 👋🏻 Thanks a lot for this thoughtful response!

I think the metadata search can be addressed with multiple ways, a server side search with some data leakage, or on the client with zero leakage.

In my comment above I wrote bits and pieces of why neither of these options are viable as-is, but I'll write here in context to better highlight the reasons most relevant to this sentence.

on the client with zero leakage.

This isn't possible due to how the math checks out. See this specific section in my comment above :

Can we make EXIF metadata searchable on Cryptee? No. Because at 64kb/photo, even with 10,000 photos, that would require downloading & decrypting 640mb of metadata to be able to search for it. Let's say a JPG photo is ~5mb per shot. A mid-level 400gb Cryptee account can theoretically fit ~80,000 photos. So that would be 5gb of EXIF data alone that would need to be downloaded & decrypted to enable search.

Each and every time you start the search on the client side, you'd have to download and decrypt 5gb worth of metadata on 400gb accounts. – Even if we compress it like you mentioned, and do so with 80% efficiency, that's still 1gb of metadata you need to download and decrypt. – Not going to happen. We can't locally cache all this either, with phones having ~16gb capacities nowadays users may not actually have 1gb available storage to cache things.

So theoretically, the only solution to this problem is to somehow search on the server-side.

a server side search with some data leakage

The word "some" does a lot of the heavy lifting there 😅 – Defining "some" is paramount for a company like Cryptee, a service that exists to provide you absolute peace of mind, and no 'data leakages'. As I mentioned in the 2) EXIF & Privacy section in my comment above :

All these extra pieces of meta-information has to be stored encrypted for each photo individually. Otherwise things like location data from photos could reveal so much about a user / their home address / work, copyright details = real names of users, and some new phones' AI cameras even tag people in photos automatically into EXIF metadata nowadays, so tons of privacy issues.

and further below :

And since there's no clear standardization, (or even a clear documentation online – most EXIF documentation online is crowd-sourced like this) we can't even confidently strip unnecessary pieces of EXIF data, and keep only the necessary pieces. (i.e. we can't even confidently say "oh yeah we strip GPS data so don't worry" – since some manufacturers put duplicate GPS data into another EXIF slot, or use a whole different slot for GPS coordinates etc.

There is no way we can reliably strip sensitive EXIF metadata, or encrypt 'some', but leak 'some' etc. and more importantly, we cannot be the arbiters of what piece of metadata is sensitive for you. i.e. For someone reporting/documenting protests or police brutality, where and when a photo is taken could mean life or death depending on the country you live in.

At a certain point labels alone aren't suitable to reasonably retreive what we are after.

I think the important question to ask here is – What additional search filters would you like to have that Cryptee doesn't offer right now? Because perhaps it's wiser to try to work our way up from there. That way we can think about solutions around these filters, and try to address these individually (like we did with tags for example) vs try and solve the entirety of the EXIF spectrum. So let me know, and perhaps we can solve this particular issue of search/filtering in different ways, and expand on our tags feature.

Hoping these make sense! Looking forward to hearing from you ✌🏻

J

hirako2000 commented 3 years ago

And thanks for taking the time to insist on the problematic results that ensue with either possible way to support metadata search.

This isn't possible due to how the math checks out

Trying to understand the math... you point out that:

Each and every time you start the search on the client side, you'd have to download and decrypt 5gb worth of metadata on 400gb accounts

No. Client(s) only sync their metadata base, incrementally and asynchronously. So each search does not suffer a full GBs worth of metadata.

Client has the full metadata set. It only updates itself when additional/changes/removal of metadata took place. And not a full refetch, it's a diff operation. It only fetches entities that changed.

I appreciate implementing a syncing capability that performs diff on a metadata set isn't a 5 mins job. But I thought I would point it out as it is feasible and would skip the heavy blind download. The client can do it, and the server can do it (timestamp on each metadata bit).

Of course when a different client connects and need to search by metadata, it will incur a full metadata set (compressed) download. No way around that.

users may not actually have 1gb available storage to cache things. Again it is optional. Some users have 128GB with more than enough spare space. We can expect storage to increase, not decrease, especially on commodity phones.

The word "some" does a lot of the heavy lifting there

I totally agree with that. I'm not aware of any server side search on encrypted data that doesn't lead to some form of leakage.

Defining "some" is paramount for a company like Cryptee

The leaking would be vulnerable to multiple forms of attacks. So that sort of leakage is not acceptable for Cryptee. We are on the same page.

I think the important question to ask here is – What additional search filters would you like to have that Cryptee doesn't offer right now?

I also think it's the relevant question here. We agree EXIF is a mess. And, imo its existence a privacy risk. Users don't know what's in there, geolocation is a serious privacy issue. Cryptee could do the world a favor and give more explicit control to the user with the storage. Which it already does by stripping out the EXIF data.

A handpicked selection of fields would probably be a simpler and most efficient solution. If I had to vote I would start with geo location. That's actually what scratched my curiosity and prompted me to join this thread.

Geo location metadata is very useful for building a mapped photo swarm. Given a date filter, it makes it very simple and intuitive for users to find those pictures which they roughly remember where and when they were taken.

Thanks for making Cryptee, and for the beautiful educating effort you put in to help the world understand the technically aspects of such tech.

Hoping these make sense!

It does, and I totally understand your pragmatism.

Edits: Formatting Edit2: Gratefulness, and typo.

johnozbay commented 3 years ago

Hi there!

Thank you for all the kind words and compliments! And deeply appreciate taking the time for these thoughtful responses! World needs more amazing people like you! 🙏🏻

No. Client(s) only sync their metadata base, incrementally and asynchronously. So each search does not suffer a full GBs worth of metadata. Client has the full metadata set.

This is based on incorrect assumptions though. 1 – Like you mentioned, there will be different clients. More than 50% of the time the device won't have the full metadata set, and can't, due to multiple device ownership.

1a – If you took the photos on your phone, and searching on your laptop, your laptop doesn't have the original photos nor the full metadata set. So you'll have to download and decrypt 1gb of metadata to search.

1b – If you took the photos on your phone, but deleted the originals (which is the case for most users, given that there are no phones that support 400gb or 2TB of storage. So people offload their photos to Cryptee then delete originals from their devices) your phone won't have the originals nor the full metadata set either. So you'll have to download and decrypt 1gb of metadata to search.

1c – Even if we made sure Cryptee itself keeps a local copy of the metadata on the phone, to sync* only the missing parts like you mentioned, this assumes for 400gb accounts, Cryptee now has to occupy at least 1gb on your phone just for (compressed) metadata for Photos that aren't even on your phone. Now flip the threat model, you thought you deleted the photos from your phone but your phone carries their metadata.

1d – If they're stored encrypted on your device locally to address 1c, this means for each search / app launch, you'd have to decrypt 1GB of metadata. If not stored encrypted, it's a bigger privacy risk due to 1c.

2 – Not all users upload from the device they took the photo on. I.e. if you take photos with DSLR cameras, you'll likely upload from your desktop, but view from your mobile device. DSLR cameras and RAW photos have even more intricate EXIF meta, increasing either the complexity of extraction, or the size of meta we need to store.

And to make things harder / worse, if you're a pro photographer, using DSLR, chances are you're on a 400GB or 2TB plan, which is where most of these encrypted-metadata-at-scale problems happen in the first place. Making things unusable from the get-go.

3 – There are actually a lot of users who have 3 devices (i.e. work desktop / home desktop / phone etc), or (desktop, phone, tablet) etc... Meaning that the "50%" I mentioned in (1) is closer to 66% or more for some users. For them search will be downright unusable. Who wants to wait to download 1GB of metadata, then wait as long to decrypt 1GB worth of metadata, on not just one, but 2 or more devices.

4 – 

The client can do it, and the server can do it (timestamp on each metadata bit).

4a – Using timestamps for sync for this specific use is a bad idea. It will leave traces indicating when you've uploaded a photo both on your device and on your server. I.e. let's say the originals aren't on your phone, and you deleted them from your device after you backed them up for privacy or security concerns. Someone inspecting your phone can now find out when you offloaded photos the, and removed from your device.

4b – This is especially problematic for Cryptee, because what happens if you're using Ghost Albums / Folders? One of our key features, and important aspects of our threat model is that we also aim to provide some deniability. So that if you're in an abusive relationship, and your abusive partner wants you to unlock the device and Cryptee in front of them, you can unlock your device and Cryptee, and still be able to hide some stuff. However, leaving timestamps / hashes / anything locally cached on the device would reveal you've got ghost folders or albums.

For example, for Cryptee Docs, we utilise local caching, and do extensive diffs etc. But if you ghost a folder, on the local clients, it's treated the same as the folder getting deleted. In case of Docs, this isn't a big issue, since the cached metadata is less than 1kb. (filenames / folder names / tags etc) but for Photos, ghosting an album could mean wiping / invalidating 100s of megabytes of local caches. Makes things a lot less feasible.


Geo location metadata is very useful for building a mapped photo swarm. Given a date filter, it makes it very simple and intuitive for users to find those pictures which they roughly remember where and when they were taken.

By "finding" and "roughly remembering" I'm guessing you mean searching photos by city / location / area names. EXIF stores geo-coordinates, not city names.

a – We can't resolve coordinates to city names. This would require sending plaintext coordinates to a lookup / resolution server. So either you can make direct connections to the resolution server (i.e. open street maps) from your device and leak your IP + geo-coordinates to them OR

b – We can proxy the connection to your coordinate-lookup/resolution server to anonymise your identity, but now, Cryptee will know your user ID + photo IDs + photos' coordinates, to be able to proxy to OpenStreetMaps. At this point we might as well just store the geo-location data unencrypted, since our servers will need to know a lot (i.e. User ID + Photo ID + coordinates to proxy the rendered the map)

c – If we can't resolve coordinates to city names, you cannot search for them easily. I.e. nobody's going to type 48.8566° N, 2.3522° E, they'll want to type "Paris". Maaaybe we can find some open source and offline coordinate-to-city javascript library that does all this on the client devices ... but holy shit that would be a big JSON list to load 😅.

d – Therefore we can't easily display photo maps either. It would require either us, or OpenStreetMaps etc to know who you are and where your photos are taken.

e – According to GDPR, in Europe, your IP Address + geolocation tags in your photos are considered Personally Identifiable Information. Here's a 39 page PDF published by the European Union diving into this very specific topic of storing / processing user location with pain-inducing amounts of detail. 😂

To save you time, locations of all photos you took imply you were there while taking the photos, therefore contain your personally identifiable location data, and constitutes Personally Identifiable Information.

And OpenStreetMaps processes and stores data in Netherlands but also ... in the UK according to their privacy policy which is no longer a part of EU. ~yet another reason why brexit was a stupid idea~

So legally we actually can't let them process your data. We can find alternative maps providers, or get JPG maps and add the photo pins on our servers, but then, again, might as well just store the geo-location data unencrypted at this point like I mentioned in (b) above.


So this is a really really complex problem with perhaps 100 different angles to consider, some technical and some even legal. And even if we could solve all the technical challenges, we may still not solve all the legal challenges unless we can find a different maps provider.

I hope I could provide some added points of perspective regarding what makes this particularly challenging for Cryptee (and for the users of Cryptee)

Don't get me wrong btw hahha I'd LOOOOVE to add all these features, and a hot looking photo map to show all your photos. It would be incredibly convenient and I know we'd make it beautiful... but yeah, still lots of thinking to do...

Let me know what you think! I'm open for ideas and thoughts! ✌🏻

hirako2000 commented 3 years ago

I read each of your point with attention, and neither server or side would perform seeing the constraints. crypt.ee is even more stringent about privacy than I originally thought.

Here is another stab at the geolocation problem:

Persistence - Client side

1/ Perform geo State inference from the geo latitude/longitude There shouldn't be a performance concern there. A local geodb for that is lightweight at a few hundreds kb and should be stable, see here.

2/ Encrypt the photo(s), as usual

3/ Upload the image as usual

Persistence - Server side

1/ Store the image as usual

Single photo lookup - Client side

1/ Fetch the lat/ln by photo ID, same as already done with labels

Server side is trivial.

Geo mapping - Client side

That's the interesting part

The State ID can serve for batching.

1/ Fetch OSM* world map tiles, randomize the landing location if that's a privacy concern.

2/ Fetch batch(es) of lat/ln sets, by the State IDs, those States visible on the map only.

3/ As new surface areas are shown when the user navigates, fetch other batch(es) of lat/ln sets, by State ID(s).

4/ Results are drawn** on the screen via pins or heat zones, whatever visual.

*I don't see exactly how fetching from the official OSM servers is a privacy concern, we aren't doing location lookups and geo detection. But if it is an issue, then as you pointed out, need another provider or self host. All we need is simplified titles, not all the details.

**I don't think returning thumbnails would scale. Thumbnails would probably have to be fetched and display upon click on an individual lat/ln entry drawn on the view, perhaps fetching thumbnails is OK if the number of coordinates shown on the screen reaches a min threshold.

Geo mapping - Server side

1/ Feed by State ID as requests come in

2/ If paranoid, then randomly add sets from random State IDs to fuzz the traffic.

How can the client map the State ID to State name?

Upon first upload of an image, the client generates the mapping table, encrypts it, and uploads it. It's only a few hundreds of Kbs in size so can be fetched on demand when mapping the two is needed.

Performance?

I think even heavy users with 1 million photos should see the pins, circles or heats zones drawn at a steady pace, the perceived performance alleviates the fact the results wouldn't show instantly.

Assuming a 40 bytes overhead for lat/ln and State ID, we are talking 40mb for all the geo data to be fetched. Compression will help.

The batching approach is beneficial for users having photos spread out around the globe. Users who happen to have taken 1 million picture in NY will have to wait for the 40mb to be returned.

Why States?

Seems like a good compromise between Countries and Cities. A state represents province in Asia, a department in France, a state in the U.S

Why bother?

I don't think this approach is as complex as it reads. I'm happy to try coding a POC if the design isn't clear in writing. It is pretty cool to see a map of our pics, and, because of the state inference, it's pretty cool to have that field populated, perhaps inferring the city will also be doable down the road and paths the way for better search filters.

Of course you may spot another half a dozen thing I totally overlooked, or decide geo location is not a problem to tackle any time soon :)

Edit: Typos Edit2: Added more reason to bother.

johnozbay commented 3 years ago

Many million thanks for this incredibly thoughtful response! 🙏🏻 Deeply appreciate the time, effort and thoughts you're putting into this.

I like the idea of State IDs, and will keep thinking about how we can keep them encrypted (perhaps similar to tags), so that our servers / we don't know the State IDs, but only the users do, and can perhaps search photos by location this way.

Otherwise, I'm afraid collecting additional pieces of unencrypted data isn't the right way to go about this. – Nor is any other solution that involves expanding our threat model to know just a bit more. – Because we can't keep re-drawing the line on what we consider private / personal information.

Wearing a developer's hat = it's a great compromise / middle-ground from a technical perspective. Wearing a privacy company's lawyer's hat = it's a bad idea, and will invite all sorts of trouble.

In short, knowing when a photo is taken + which state & country a photo is taken actually provides a lot of information I am not comfortable for us to know as a company. Here are a few examples where the State ID alone from that table you linked would/could get us in different sorts of technical, political and legal trouble as a business:

1 – I'll start with a real-life scenario where not knowing this came in quite handy, benefited our users and saved us a major potential headache as a company.

In the wake of the deadly attacks in Sri Lanka in April 2019, to prevent the spread of imagery / news, Sri Lanka's government blocked lots of websites and social media. Cryptee being one of the leading examples of "difficult to censor" websites, a few media outlets reached out to Cryptee / me to ask if anything could be done about this. So I gave an interview about this to The Guardian, Deutschlandfunk, Eldiario and a bunch of other outlets, which to my happy surprise made frontpage news at the time. 

If we knew in plaintext, which countries & states the uploaded photos were taken in, – and let's say for example we hypothetically had a few users who uploaded photos from Sri Lanka (again to emphasize: we don't know if they did or not) – I'm of the opinion that these articles about Cryptee would paint a big red target over Cryptee too after they came out, and we could face hypothetical difficulties afterwards as well. But it's all hypotheticals of course. We didn't know, and we don't want to know the locations of photos.

2 – For example in the State ID database you sent, look at the state with ID 4689. Cryptee wasn't around back in March 2014, but I'm speculating that knowing the State ID = 4689 and March 2014, would be risky for us as a company based in Estonia at the time.

3 – Same for August 2020, and State ID = 2958. Again, having unencrypted state id + time/date would pose a risk for the users' well being or us as a company.

To summarize, during any major events / protests etc, people take photos, and upload them to places where these photos will be safest. Especially in these situations, knowing State ID, Country + Time is already knowing too much. And I want Cryptee to be a safe place for everyone, where they don't have to think about this. (or leave users to wonder whether that piece of info was encrypted but not that other piece of info etc)


And let me dive in a bit about why displaying / rendering maps ourselves is a legally bad idea.

1 – Here's a WSJ article about how Google redraws the borders on Maps, depending on who's looking from where. TDLR; there's at least 30+ countries around the world where the borders are disputed.

2 – You might be thinking ... but John, you're in EU, how's the border dispute between India / Pakistan relevant to you legally... Allow me to introduce you to the great Turkey & Cyprus (EU) border dispute and what that meant for google maps. TDLR; Turkey is the only country in the world that recognizes Northern Cyprus as a country. When searching for Northern Cyprus from a Turkish IP address you will clearly see Northern Cyprus as its own location with its own borders. But when North Cyprus is viewed from any other country including Cyprus, you see it all within the borders of one country, Cyprus.

etc etc.

Basically drawing / displaying / proxying your own maps are a political and legal nightmare that I'd prefer to avoid.


Do I want us to get censored by nation states around the world over storing geolocation data unencrypted = No.

Do I want us to get into hot water and have to deal with nation states around the world over displaying 'incorrect/wrong' maps? (even if we took them from OSM) = No.

Do I want to store encrypted location metadata and make at least search by location possible, without showing any glimpses of a map = Yes.

So in short, using a combination of the db you've linked, and some on-device hashing and encryption, we may be able to enable search by location, like we did with tags, but all location data needs to be encrypted. Otherwise, we're not going to build something where we know the locations of photos. (even if it's not precise, and only at state/country level)


I know these are rather dark topics to talk about on Github, and perhaps unusually detailed for simple feature request. But I think it's important for us to communicate the types of things we consider (and must consider) on a daily basis while working on features for Cryptee, which many users around the world rely on to keep their private and personal data safe.

I hope you don't take any of my comments the wrong way 😅 ✌🏻 I want to bring these features to life just as much as you do. It's just a matter of figuring out these intricacies so that our users don't need to worry about whether if the camera lens EXIF metadata was encrypted or not some day. hahaha

Let me know if you can think of a way with these constraints in mind, I'm all open for ideas! Love this coordinates table and bookmarking it. I think we might be able to find a way with this in hand. Going to keep thinking about this for a while.

All the very best, John

hirako2000 commented 3 years ago

Because we can't keep re-drawing the line on what we consider private / personal information.

Totally with you on that. Let's not re-draw that line, unless it is redrawn in the better privacy direction.

In short, knowing when a photo is taken + which state & country a photo is taken actually provides a lot of information I am not comfortable for us to know as a company.

Totally with you, it is not what the proposed approach would imply though. Perhaps the part about the State mapping and encryption wasn't clear. See further below.

Here are a few examples where the State ID alone from that table you linked would/could get us in different sorts of technical, political and legal trouble as a business.

Thanks for sharing the anecdote, it is a sad fact. I think I read that case, and we could add numerous situation where attempts of this sort, with mixed success was performed by authorities in multiple locations. Belarus displays intensifying cases of internet censorship.

Unscrupulous governments have shown their suborn views, and will go as far as targeting specific individuals.

2 – For example in the State ID database you sent, look at the state with ID 4689 Same for August 2020, and State ID = 2958. Again, having unencrypted state id + time/date would pose a risk for the users' well being or us as a company.

Unencrypted state ID would not cause a leak. It is a user specific state ID, which gets generated out of the source map of states. E.g ID = 2958 becomes ID = 45ef23. For another user could well be 87rz43. It is arbitrary and only serves to index the entires on the server side with an minimal length.

If the very fact of having yet another plain text field in the data base is a concern from a legal perspective, fine, encrypt it. The indexing would just cost a bit more, that's all.


there's at least 30+ countries around the world where the borders are disputed. Basically drawing / displaying / proxying your own maps are a political and legal nightmare that I'd prefer to avoid

I'm loving the fact you pointed that out. What stance should cryptee takes?

I think this one: Draw no border. No bias, and zero recognition of any border. Users don't need borders. It's actually what crypt.ee is building for the web, the drawn map would reflect that.


Do I want to store encrypted location metadata and make at least search by location possible, without showing any glimpses of a map = Yes.

That is where we are going with this. To sum up what would be on the server side:


I know these are rather dark topics to talk about on Github

I think those are topics that should be discussed more often on Github and social hubs at large, and an issue is the great place to discuss specific privacy concerns so that casual users who don't understand why feature x, y or isn't available on privacy focused applications. It will make them question the availability of privacy disaster features enabled by default on other applications, increasingly.

The lack of awareness is partly due to the lack of public discussion. I estimate that 99% of the population don't have a grasp of the privacy implication of modern applications. Even within the techy circles it seems that we only cover the surface of the problem, and don't fully measure the consequences.

I hope you don't take any of my comments the wrong way 😅 ✌🏻

I absolutely don't. I take your comments as meticulously challenging suggestions that could subtly but surely compromise the censorship resistance of your system.

figuring out these intricacies so that our users don't need to worry about whether if the camera lens EXIF metadata was encrypted or not some day. hahaha

Valuable exercise. I'm with you, meta data, as you mentioned timestamps, and geolocation are particularly sensitive and should be considered with caution.

Let me know if you can think of a way with these constraints in mind, I'm all open for ideas!

I can't see how the constraints invalidate the proposed solution. But I didn't spend much time coming up with it. There are probably more efficient approaches, and/or more privacy guarantees that we could add in.

And of course I may only partly understand the constraint you've kindly explained at length in this thread. I thank you for that and I do question my interpretations 😅.


I'm a bit curious so will try to build a separate POC to see how it would perform. At least for fun.

johnozbay commented 3 years ago
  • Encrypted lat and ln pair for each photo that is uploaded with geo location setting ON (the user should still decide what is stored and what is not, even encrypted).

  • Unencrypted link key, in the form of a randomly generated (by the client) ID of the State. The server still has no clue what state it actually is. Legal may have a problem with that? Encrypt it.

  • Encrypted small blob representing the generated State IDs to State Strings mapping. Only the client having the encryption key would ever be able to tie the random IDs to the state names. And this operation should happen at runtime when drawing the map or displaying the State field on an individual photo.

The problem with your proposed solution is still missing what I mentioned above. Let's say you did all that. All this only solves the 'storage' component of storing locational data, by encrypting it. In that case we may as well just encrypt lat & lon, and we don't need to deal with all this complicated and unnecessary state IDs etc.

What your proposal doesn't solve or address is how does the client request/get map tiles from the server for the state & country without revealing the user id, IP, state and country to the server?

– Client cannot request directly from OSM due to reasons I mentioned above. – Client cannot request from our servers, because we still have to authenticate all these requests, (otherwise we're operating a free map tile retrieval proxy server). And if we authenticate these requests, then we necessarily have to know the IP + User ID + State + Country for the map tiles.

One way or another, our server ends up knowing the location of a users' photos. –albeit at state / country level–

So a summary in two sentences: Displaying photos on a map is not possible in a no-knowledge way, unless you ship the entire world map with the app. Eventually a server somewhere will have to authenticate and retrieve the map tiles for the user, and thus will necessarily know the user has photos from these tile locations.


The only thing that may be possible is allowing users to search their photos by somehow shipping a massive list of all possible country / state / city names etc client side, then allowing users to search against this list on the client side.

Problem with this is, it's hard to internationalize. i.e. the database / list you sent is in English or regionally localized.

Nobody in Russia is going to search for "Moscow". They will search for : "Москва". Nobody from Sweden is going to search for their vacation photos by typing "Estonia". They will type "Estland" Nobody in Finland is going to search for their vacation photos in "France". They'll search for "Ranska".

So we can't use that list / db for the purposes of search either. Back to square one 😅

johnozbay commented 9 months ago

Hi there! Today we shipped RAW / TIFF / DNG / 3FR / FFF support to Cryptee photos in 60bcdc9 and it makes it possible to see some of the EXIF data such as camera make/model/lens/aperture/exposure etc in the lightbox. You can read more about it here!

image

Many thanks for your help and patience with us everyone! ✌🏻 We're tackling one big challenging task at a time. We're not sure if we'll ever be able to ship EXIF search, so I'll close the issue for now — but this is always going to be on our list, and we'll come back here if anything changes.