NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
26 stars 12 forks source link

Validate SystemMetadata.checksumAlgorithm in the DataONE API calls #1217

Open mbjones opened 6 years ago

mbjones commented 6 years ago

Author Name: Chris Jones (Chris Jones) Original Redmine Issue: 7234, https://projects.ecoinformatics.org/ecoinfo/issues/7234 Original Date: 2017-12-19 Original Assignee: Jing Tao


Bryce pointed out that we have many incorrect @checksumAlgorithm@ strings various MNs. See https://github.nceas.ucsb.edu/KNB/arctic-data/issues/283. The upshot is that @SHA-*@ is the broadly supported syntax.

I checked the strings with:

package org.dataone.tests;

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;

public class MessageDigestDTest {

    public static void main(String[] args) {
        MessageDigest md = null;
        List<String> algorithms = new ArrayList<String>();
        algorithms.add("MD5");
        algorithms.add("MD-5");
        algorithms.add("SHA1");
        algorithms.add("SHA-1");
        algorithms.add("SHA224");
        algorithms.add("SHA-224");      
        algorithms.add("SHA256");
        algorithms.add("SHA-256");      
        algorithms.add("SHA384");
        algorithms.add("SHA-384");
        algorithms.add("SHA512");
        algorithms.add("SHA-512");

        for (String algorithm : algorithms) {

            try {
                md = MessageDigest.getInstance(algorithm);
                System.out.println(md.getAlgorithm() + " is recognized.");

            } catch (NoSuchAlgorithmException e) {
                System.out.println(e.getMessage());

            }           
        }       
    }
}

and got:

MD5 is recognized.
MD-5 MessageDigest not available
SHA1 is recognized.
SHA-1 is recognized.
SHA224 MessageDigest not available
SHA-224 is recognized.
SHA256 MessageDigest not available
SHA-256 is recognized.
SHA384 MessageDigest not available
SHA-384 is recognized.
SHA512 MessageDigest not available
SHA-512 is recognized.

Change @MNodeService@, @CNodeService@, and @D1NodeService@ methods that send or receive @SystemMetadata@ documents and validate the given string with @MessageDigest.getInstance(algorithm)@. If we get a @NoSuchAlgorithm@ exception, throw an @InvalidSystemMetadata@ exception for the call.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Matt Jones (Matt Jones) Original Date: 2017-12-20T17:31:11Z


The definition of the ChecksumAlgorithm type says that algorithm names must be drawn from the Library of Congress controlled vocabulary:

The cryptographic hash algorithm used to calculate a checksum. DataONE recognizes the Library of Congress list of cryptographic hash algorithms that can be used as names in this field, and specifically uses the madsrdf:authoritativeLabel field as the name of the algorithm in this field. See: Library of Congress Cryptographic Algorithm Vocabulary. All compliant implementations must support at least SHA-1 and MD5, but may support other algorithms as well.

We should be checking against that list, and not the Java names, which may not be language neutral.

mbjones commented 6 years ago

Original Redmine Comment Author Name: Jing Tao (Jing Tao) Original Date: 2018-01-18T00:10:54Z


According the list here http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html some names from the list are: MD5 SHA-1 SHA-256 SHA-384 SHA-512

It doesn't show SHA-224. I am not sure if it is in the list.