Closed DavidTPate closed 8 years ago
With this, as with all cryptographic hashes and other payloads, the de facto standard (not just in xapi) is to use hex. You can find numerous examples of this in foaf data. I'm fine with clarifying this existing requirement in the text.
@fugu13 Can you point me to where it is defined as "the de facto standard" to use hex.
Digging into specifications some more where we are dealing with headers for things like if-none-match
those are defined as being encoded in ASCII with some backwards compatible characters that aren't fully supported in all browsers.
The clartification for these occurs in section 2.1 under the Augmented BNF
header.
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]
With the ABNF for this being:
TEXT = <any OCTET except CTLs, but including LWS>
Where CTLs
are US-ASCII control characters and LWS
is linear white space.
Definitely think this just needs some clarification on what it should be, so that each LRS will be expecting the same type of encoding for the value.
@DavidTPate a de facto standard is one that isn't defined anywhere, but is so prevalent as to be expected.
First, briefly, HTTP headers use esoteric workarounds due to their particular history. But yes, as I said, I think clarifying text is good.
Some illustrations of it being a de facto standard:
The only examples of textual representation in the MD5 spec are hex: https://www.ietf.org/rfc/rfc1321.txt
Once some binary illustrative bits are done, hex is the way SHA1 is represented textually throughout its spec: https://tools.ietf.org/html/rfc3174
The same for SHA2 in that spec (which is heavily modeled on the SHA1 spec): http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
The mbox_sha1sum spec references as accurate a set of code examples: http://xmlns.com/foaf/spec/#term_mbox_sha1sum , and those code examples all show using hex: http://www.intertwingly.net/blog/1545.html
Programming libraries for working with cryptographic hashes such as python's often only have two convenience methods for outputting the hash: either the raw binary form, or hex: https://docs.python.org/2/library/hashlib.html
Summing up, given how cryptographic hashes outside of binary (which isn't appropriate here) are virtually always represented in hex, from underlying specs to use specs to programming libraries, there's no reasonable interpretation of the xAPI spec where putting a textual representation of a cryptographic hash without specifying otherwise doesn't mean using hex. We should add clarifying text to that effect, so others won't be confused in the future.
Per the 4/20/16 call, clarifying language would be welcomed, but sha1sum has always been a requirement. It is so default that the crypto-community doesn't talk about it.
Currently, I do not see any language around validity requirements for the
mbox_sha1sum
property for an Actor. The document linked to for FOAF doesn't provide any details around the encoding either. I think we should define the encoding that this should be provided in and some ways to confirm validity if possible.Right off the bat it seems like defining the encoding as
base64
would make the most sense, other options would bebinary
orhex
. I thinkbinary
would be unnecessarily verbose when there ishex
as an alternative.Going with
base64
would cause less data to be stored and sent over the wire for queries (hex
uses 2 characters for each byte, whilebase64
uses 4 characters for every 3 bytes) the caveat would be that requests would have to be URI encoded for the query parameters, but that has to happen anyways in case there is anmbox
withsomeemail+something@somewhere.com
or if anaccount
is being queried for.With
base64
,hex
, orbinary
we would then be able to validate that thembox_sha1sum
value looks correct.