Data-Protection-Control / ADPC

Advanced Data Protection Control (ADPC) is a mechanism to communicate data subjects' (users') consent and privacy decisions with data controllers (service providers).
http://dataprotectioncontrol.org
Mozilla Public License 2.0
48 stars 6 forks source link

Resilience against abuse #25

Open hilbix opened 1 year ago

hilbix commented 1 year ago

I think there is something important missing in the standard.

AFAICS we need a high resilience against abuse (and more, see below). For this, we probably need Subresource Integrity plus a bit more.

If done right, this not only thwarts tracking (to some extent), the main purpose is to eliminate all ambiguity of the consent. And possibly to hinder some correlation attacks on the ADPC header (read: Get hold on your consents by some evil third party).

Beware, I am no Cipherpunk. So perhaps I mixed up the cryptographic parts.

Main motivation

When I consent to something on the web, I consent to a specific version of the legal document of the website.

But when the website changes these terms, I still only consent to the old version. Else a website can say:

"Do you consent to terms 1 to 5 of our AGB" and then change the AGB at their liking, because the browser still continues to send the consent header.

No way!

Hence what we need is to consent to the exact legals terms.


The proposed changes in the "Consent Request Resource"

Glossary:

The CRR must:

Part 1: Link to Legal Terms using SI

As the consent due to some CRR always is given according to the current legal terms, the consent is bound to these legal terms. When the legal terms change, the consent does not automatically apply to the new legal terms. Hence there must be a way to track to which the consent belongs.

There already is a standard for this called SI. So the link to the Legal Terms must be embedded in the CRR and protected by SI. This way we have a link between what you have consent and what the terms were at the time of the consent. And this link cannot be broken (only got lost).

If such a page which covers Legal Terms contains additional legally binding resources (like images, links to other pages, etc.) then these links must also be covered by SI. Any link which is not covered by SI hence is not legally binding and not part of the consent.

There may be several links in the CRR of this type:

YMMV but I see a use case here.

The interesting part here is, that we only need the SI hash to uniquely identify the document. We do not need some (possibly lengthy) URL or similar. The hash is enough.

Part 2: Wording of the CRR

When I consent to something, it is important, which variant I consented. Because there are subtle details in the different languages. And AFAICS we need translations. But the user does not use all the translations, just one.

The vendor might also revise these texts, to clarify or whatever, while not revising the Legal Terms, as this something completely different. For proper changes in the Legals, you need Lawyers. However for changes of the CRR, you just need a CTO or even only the Web Designer.

Hence both are different, and need a different security context.

In the CRR there are currently IDs. Which could be abused for tracking. (Feel free to add links to those other Issues here.)

Tracking is bad. Hence this proposal eliminates the IDs in favor of the index. But the index lacks information about, what the consent contained. Hence we need a clear thing which expresses it again.

To be able to track Part 2, the JSON object must be brought into some stable format and then is hashed with an algorithm. This is done in the next part.

Part 3: The combined Hash

A third section of the CRR now creates a combined hash for Part 1 and Part 2 in a way that both can no more be separated.

The two parts offer O(NxM). But quite often only N or M make sense. Hence the Vendor provides these meaningful combinations in Part 3 of the CRR. For example, for Switzerland (.ch) there may be 3 combinations or 9 (for the 3 languages of the consent, and the 3 languages of the Legal Terms) combinations, because some people might want to agree in Swiss language but with the French legals. I really have no idea, why. But we can allow this freedom.

The Vendor now provides entries of the form:

{ "Hash": [ "part1", "part2" ] }, looking like {"oqVuAfXRKap7f": ["en_US","de_DE"]}

This is only a proposal. I am no Cipherpunk, hence somebody else might find something more secure.

The important part here is, that the Browser is able to calculate and verify this hash independent of the Vendor.

This does not make tracking impossible, but it becomes very difficult. As you - see below - must be able to reproduce the documents which produced this Hash on the User's request, as the Browser had verified the documents at the time of the decision.

Now the consent

ADPC: withdraw=*, object=direct-marketing

ADPC: withdraw=*, object=direct-marketing, consent=oqVuAfXRKap7f,3,4

It still is short enough (probably even shorter as with the current spec) and also mathematically exact references everything.

Requirements

There must be some rules, so it is clear how to handle edge cases:

The Consent is only valid, if the Vendor is capable to reproduce all documents from this Hash. Else the Consent is voided.

If the Vendor is incapable to reproduce all documents, the Consent is invalid and hence no more given.

The latter requires a properly functioning change management, as there is nothing more annoyingly than some consent, which is re-requested at a steady pace. (As other who follow this Spec do it less and less, those evil websites get even more pressure).

Tracking

Tracking is still possible, but needs to keep (waste) a lot more effort.

Additional data protection

Targeting users, such that the user has a specific Hash, still is an option. And there is some really good reason for this.

Your consents can be highly sensitive data (porn preferences, diseases you have). So there might arise the need to hide them.

As the Vendor can recreate all the data (see requirement above) anybody who gets hold on this short hash can revert everything such, that it is clear, what you have chosen.

That is bad.

But the above easily offers a way to escape this:

YMMV but I think, if we introduce a new standard, it must also offer things like that. Because Consent is not entirely a bad thing. Perhaps it is used between you and your Health Insurance. And having a standard, it should protect us in all directions.

Just my 2ct to this. I do not claim this is perfect nor complete. Perhaps I even have forgotten about more important things.

But I hope it is fairly easy to implement and to understand. As good things should be as simple as possible.

To sum it up

Using Cryptographic Hashes and SI not only protects against abusing this standard by faking documents, it also gives us some good basic protection against tracking abuse. Also it gives us the opportunity, to protect sensible consent making from possible correlation attacks.

As we do not know what the future holds, a new standard should be created as resilient against abuse as possible.

PS: Sorry, I do not proof-read it (again). I only edited the major typos I found. So please bear with me and the typos.