adobe / xdm

Experience Data Model
Creative Commons Attribution 4.0 International
245 stars 318 forks source link

Define limitations on the set of characters to be used for the xdm:id property #432

Open fmeschbe opened 6 years ago

fmeschbe commented 6 years ago

In issue #419 @jbeckert comments:

Does xdm:id for Identity need a "pattern" property to reject ids with prohibited characters, e.g. characters that would mess up the usage of the value in URL path components?

This is a very valid concern and we should absolutely work through it and properly handle it.

Assuming the intent would be for this to be the "URL-safe" characters, correct ?

I propose to add two pieces:

The question is, what set of characters we should be supporting.

Looking at the section 3.3. Path of RFC 3986 one option would be to support pchar except percent-encoded which would be forbidden:

pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

But then, I am not so sure, there is any value in most of the sub-delims in an identifier. So I propose to just use unreserved plus :, @, +. This gives us a good ability for identifiers, such as UUIDs, email addresses, and even some URNs.

So the proposed pattern would be:

pattern = "^[a-zA-Z0-9:@+._~-]+$"

What are the schemas that are affected by the issue

Identity, EndUserIds, Profile, ExperienceEvent (and their extensions)

What are examples of products that are impacted by the issue

Analytics, Campaign, Ad Cloud, Target

jbeckert commented 6 years ago

This looks to me like a very sensible (and safe) character set. The only downside I can see is that Base64 encoded secure hashes aren't supported as identifiers any longer. We don't allow '=' but also don't support '/' which was a problem to begin with. Oh well.

fmeschbe commented 6 years ago

For base64 it would be base64url which uses - and _ instead of + and /.

And we can add = to it.

jbeckert commented 6 years ago

Yes, base64url would do the trick. Nice.

fmeschbe commented 6 years ago

To recap, then, the valid xdm:id properties must comply to the following ABNF production:

id     =  1*char
char   =  ALPHA / DIGIT / "@" / "+" / "."  / "-" / "_" / "~" / "="
ALPHA  =  %x41-5A / %x61-7A               ; A-Z / a-z
DIGIT  =  %x30-39                         ; 0-9

This production is encoded for validation as the following pattern:

pattern = "^[a-zA-Z0-9@+._~=-]+$"

This allows for email addresses, numbers, and UUIDs, but also Base64URL encoded values (with or without padding).

If we go for the URN proposal in #434 the same pattern would have to be applied to the namespace xdm:code property.