adlnet / xAPI-Spec

The xAPI Specification describes communication about learner activity and experiences between technologies.
https://adlnet.gov/projects/xapi/
908 stars 404 forks source link

Representation & Validity of `mbox_sha1sum` #909

Closed DavidTPate closed 8 years ago

DavidTPate commented 8 years ago

Currently, I do not see any language around validity requirements for the mbox_sha1sum property for an Actor. The document linked to for FOAF doesn't provide any details around the encoding either. I think we should define the encoding that this should be provided in and some ways to confirm validity if possible.

Right off the bat it seems like defining the encoding as base64 would make the most sense, other options would be binary or hex. I think binary would be unnecessarily verbose when there is hex as an alternative.

Going with base64 would cause less data to be stored and sent over the wire for queries (hex uses 2 characters for each byte, while base64 uses 4 characters for every 3 bytes) the caveat would be that requests would have to be URI encoded for the query parameters, but that has to happen anyways in case there is an mbox with someemail+something@somewhere.com or if an account is being queried for.

With base64, hex, or binary we would then be able to validate that the mbox_sha1sum value looks correct.

fugu13 commented 8 years ago

With this, as with all cryptographic hashes and other payloads, the de facto standard (not just in xapi) is to use hex. You can find numerous examples of this in foaf data. I'm fine with clarifying this existing requirement in the text.

DavidTPate commented 8 years ago

@fugu13 Can you point me to where it is defined as "the de facto standard" to use hex.

Digging into specifications some more where we are dealing with headers for things like if-none-match those are defined as being encoded in ASCII with some backwards compatible characters that aren't fully supported in all browsers.

The clartification for these occurs in section 2.1 under the Augmented BNF header.

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]

With the ABNF for this being: TEXT = <any OCTET except CTLs, but including LWS>

Where CTLs are US-ASCII control characters and LWS is linear white space.

Definitely think this just needs some clarification on what it should be, so that each LRS will be expecting the same type of encoding for the value.

fugu13 commented 8 years ago

@DavidTPate a de facto standard is one that isn't defined anywhere, but is so prevalent as to be expected.

First, briefly, HTTP headers use esoteric workarounds due to their particular history. But yes, as I said, I think clarifying text is good.

Some illustrations of it being a de facto standard:

The only examples of textual representation in the MD5 spec are hex: https://www.ietf.org/rfc/rfc1321.txt

Once some binary illustrative bits are done, hex is the way SHA1 is represented textually throughout its spec: https://tools.ietf.org/html/rfc3174

The same for SHA2 in that spec (which is heavily modeled on the SHA1 spec): http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf

The mbox_sha1sum spec references as accurate a set of code examples: http://xmlns.com/foaf/spec/#term_mbox_sha1sum , and those code examples all show using hex: http://www.intertwingly.net/blog/1545.html

Programming libraries for working with cryptographic hashes such as python's often only have two convenience methods for outputting the hash: either the raw binary form, or hex: https://docs.python.org/2/library/hashlib.html

Summing up, given how cryptographic hashes outside of binary (which isn't appropriate here) are virtually always represented in hex, from underlying specs to use specs to programming libraries, there's no reasonable interpretation of the xAPI spec where putting a textual representation of a cryptographic hash without specifying otherwise doesn't mean using hex. We should add clarifying text to that effect, so others won't be confused in the future.

andyjohnson commented 8 years ago

Per the 4/20/16 call, clarifying language would be welcomed, but sha1sum has always been a requirement. It is so default that the crypto-community doesn't talk about it.