Data deletion syntax questions

InteractiveAdvertisingBureau / Data-Subject-Rights

IAB Tech Lab Data Deletion Request Framework specification

1 stars 0 forks source link

Data deletion syntax questions #4

Open bretg opened 2 months ago

bretg commented 2 months ago

Reviewing the doc for Prebid Server implementation, several questions came up.

1) Can deletion requests contain more than one ID at once, or must there be multiple requests? The text seems to indicate there can be more than one, but the actual protocol doesn't define arrays.

Specifically, doc says "1st Party determines what identifiers are subject to the request." but then the sub field is defined as a string rather than array of strings.

It would be quite nice if multiple IDs could be in the same request rather than having to process many separate requests since they're going to overlap significantly.

2) The sub field is listed as being a string, but is shown in the examples as an object.

Screenshot 2024-06-26 at 11 51 03 AM

yet

    "sub": {
        "identifierValue": "28f6dc889e...fe167",
        "identifierType": "email",
        "identifierFormat": "sha256"
    },

3) How do data deletion requests map to ExtendedIDs? The document example indicates the sub field "contains the identifier type". e.g. "email" in the example.

Is it intended that the identifierType match the value in user.eids.source? e.g. pubcid.org, id5-sync.com, etc. Prebid has a list of many EID sources. Entities will need to know where to look for the identifierValue and there are often many EIDs in a single request.

thanks

bretg commented 1 month ago

@jaredmoscow -- what forum should questions like this be surfaced?

jaredmoscow commented 1 month ago

Hi @bretg,

Each deletion request needs to be an atomic unit with a single identifier in the body.
Good question here about the sub claim. In the JWT spec I believe the sub claim is formatted as a string. With the multiple items here, this might just require the current object to be relabeled or have it separated into three claims.
The specification does not define how participants map a shared identifier to other identifiers in their own system. Each pair of participants in a request will communicate a supported ID (based off the dsrdelete.json information) to propagate a request. Obligations to a request are determined by the recipient. The format, type, and value information should be included in the dsrdelete.json so that downstream participants know what to work off of when receiving a single request.

simontrasler commented 1 month ago

A couple of comments:

Point 3 relates to a shared ID. The question is about the definition of identifierType for an EID, and should it be the eids.source (typically a domain name) or something else? Or is there an expectation that publishers need only define and support deletion of their first-party IDs?
Regarding point 1, why does each deletion request need to specify only a single ID? This could be quite inefficient and makes it difficult for the caller to track success, or even (from a mobile browser) to send the requests successfully in the first place -- I'm assuming we're not requiring the end user to ask for each ID to be deleted one at a time, in which case there is a single transaction to delete the user's N IDs. It would be helpful to support an array of IDs to be deleted -- this will give the caller the choice of how many to bundle together, when it has the reasonable expectation the recipient will need to delete them all.

bmayd commented 3 weeks ago

Adding to what Jared posted above:

Each deletion request needs to be an atomic unit with a single identifier in the body.

We originally included support for multiple identifiers in the early versions of the spec, but realized doing so would cause problems in cases where intermediaries needed to communicate subsets of IDs they'd received to different partners. The model depends upon the idJWT being signed by the 1st-party and if it contained multiple IDs, intermediaries would not be able to forward it to a partner with whom it shared only a subset of the IDs due to the risk of participants learning about other IDs a user is known by. Limiting the request transactions to a single ID also simplifies communication of outcomes and issues: if multiple IDs were allowed, responses would have to specify which identifiers were accepted and which failed; with a single identifier successes and failures occur at the transactions level rather than a sub-transaction, identifier level.

Good question here about the sub claim. In the JWT spec I believe the sub claim is formatted as a string. With the multiple items here, this might just require the current object to be relabeled or have it separated into three claims.

Per this a sub claim is any JSON value: "In a JWT, a claim appears as a name/value pair where the name is always a string and the value can be any JSON value."

The specification does not define how participants map a shared identifier to other identifiers in their own system. Each pair of participants in a request will communicate a supported ID (based off the dsrdelete.json information) to propagate a request.

Partnering participants could agree to use the Prebid list of EID sources as basis for defining the identifier type. We didn't define a the allowed types because they aren't standardized and are often unique to partnerships.

simontrasler commented 2 weeks ago

Thanks @bmayd @jaredmoscow:

Re: separate requests, thanks for the color. I ask you please connect with Prebid since my takeaway is this design point has issues either way.
Re: the sub field, my apologies I'm still not clear -- which of the following is a correct way of writing the value?
- { "identifierValue": "28f6dc889e...fe167", "identifierType": "email", "identifierFormat": "sha256" }
- "{ \"identifierValue\": \"28f6dc889e...fe167\", \"identifierType\": \"email\", \"identifierFormat\": \"sha256\" }"
- Something else.
Re: the contents of the sub object, there is no specification, only examples. It would be helpful to get more detail in the spec, especially to point out that the values for identifierValue are defined by the caller. (It would be problematic to allow these to vary by caller-recipient combination.)