NCEAS / eml

Ecological Metadata Language (EML)
https://eml.ecoinformatics.org/
GNU General Public License v2.0
40 stars 15 forks source link

authentication element should have its method attribute required #276

Open mobb opened 7 years ago

mobb commented 7 years ago

In the physical tree, the authentication element has an optional element "method". But the authentication (eg, md5 hash) is essentially useless without knowing the method. We suggest that the attribute be required.

https://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/./eml-physical.html the EDI group has a

amoeba commented 7 years ago

I like this 💯. What other types of authentication were envisioned when this module was designed? IIRC this was also to support describing resources that require an authentication mechanism to access such as LDAP. In that case, the authentication doesn't make sense without the method unless we can guess.

Are there other use cases of the authentication element where method would should not be required? Maybe a @mbjones question, or others.

mbjones commented 7 years ago

The authentication field name was poorly chosen, as it leads to confusion as evidenced by Bryce's interpretation of the field. This is a field to hold checksum values in order to authenticate that the file contents are unmodified. IMO, the field should have been called 'checksum'. The method should be one of the well known checksum methods, including md5, sha-256, etc. The examples in the documentation should make this clear.

I am supportive of requiring the method attribute, but note that this will likely make some historical documents invalid. In the past, we have tried to minimize introducing new hard requirements unless there is clearly something broken. As you say, @mobb, one can't really use the value without the method. So, it may qualify. If we do change it, we should also consider being more explicit about the vocabulary to be used to indicate the method.

DataONE recognizes the Library of Congress list of cryptographic hash algorithms that can be used as names in this field, and specifically uses the madsrdf:authoritativeLabel field as the name of the algorithm in this field. See Library of Congress Cryptographic Algorithm Vocabulary.

So, given a breaking change, should we also take this opportunity to rename the field to something more intelligible, such as checksum? Thoughts @mobb, @csjx, and @amoeba ?

amoeba commented 7 years ago

👍 on the element name change 👍 on changing the method attribute to required. I'd worry users wouldn't include the method if the validator doesn't complain

mobb commented 7 years ago

This suggestion was the result of a conversation in our EML metadata congruence checking group - where we are using the authentication element to check congruence of an asserted checksum to the one calculated by the system (PASTA) at download. Our check includes looking for the presence of the method attribute, so having it required by the schema is not really necessary. It was our implementation discussion that highlighted this need, and prompted this discussion.

We also thought that the element name could have be clearer, but breaking backward compatibility is rather dire. I'll put this this up with the group at our next session (probably this week).