DICE-UNC / dfc-dataverse-integration

integration of DataVerse into DFC
1 stars 0 forks source link

DVN Service #1

Closed michael-conway closed 10 years ago

michael-conway commented 10 years ago

create a service to be used in an indexer to move data from an iRODS grid to DVN

michael-conway commented 10 years ago
  1. Dataverse has the following logical/physical(storage) hierarchies:

1.1 Logical perspective

A dataverese contains [0..n] studies; A study contains [0..n] files such as an SPSS data, image, text files; [Note: A parsable statistical file such as an SPSS data file goes through extra steps whereas non-parsable files are copied to a file system without these extra steps]

1.2 Physical(storage) perspective

The above hierarchical dataverse-study relationship is _NOT_ Mapped to the storage system, i.e., no directory hierarchy such as /${dataverese_id}/${study_id} exists.

However, the above study-files relationships are mapped to directory-files ones; for example,

A study whose StudyId is 10037 is literally mapped to a sub-directory of the local file system:

../10037

A uploaded file to the above study is ultimately stored under the above sub-directory as follows:

../10037/750

where fileId(=750) is automatically generated with its corresponding Database table.

Because Dataverse 4.0 abandoned the above logical scheme (both study and dataverse are called "dataset") and we would like to minimize our coding efforts specific to Dataverse 3.x., I decided not to touch the current Database tables of Dataverse 3.x and came up with simpler solutions for the forthcoming demos with Dataverese 3.x. One of these solutions is that a study whose files are stored on an IRODS instance _must have a specific prefix such as "ODUM-IRODS" so that the IRDOS-specific logic can be kicked in without looking up a study table. Therefore, as I mentioned in my previous e-mail, setting the ID of a new study by an API user is imperative and the following curL example shows how to do this.

// curL example

curl -k --data-binary @atom-entry-study.xml -H "Content-Type: application/atom+xml" -u akio:akio https://localhost:8181/dvn/api/data-deposit/v1/swordv2/collection/dataverse/dtdv

where "dtdv" is the alias of a target dataverse (here an implicit assumption is that an Deposit API user has this information beforehand), "atom-entry-study.xml" contains a minimum set of metadata to create a study whose ID is user-specified (see the example below), "akio" (1st) is a registered dataverse user name, "akio" (2nd) is the above user's password

// The contents of atom-entry-study.xml

<?xml version="1.0"?> <entry xmlns="http://www.w3.org/2005/Atom" xmlns:dcterms="http://purl.org/dc/terms/"> dcterms:titleirods-testing study created by an undocumented deposit API command/dcterms:title dcterms:creatorAkio Sone/dcterms:creator dcterms:identifierhdl:TEST/ODUM-IRODS_10010/dcterms:identifier

where "hdl:TEST/" in the dcterms:identifier tag is a boilerplate token when a handle-server is not specified and in terms of the aforementioned storage hierarchy "TEST" is actually the parent directory of a directory that represents a study.

akio-sone commented 10 years ago

The following is a sample post method : public static void testUploadStudy() { boolean local = true;

    String user = "akio";
    String password = "akio";
    String hostname = "dvntest.irss.unc.edu"; // bart=244  lisa=246 maggie=253
    if (local) {
        hostname = "localhost";
    }
    logger.log(Level.INFO, "test machine hostname={0}", hostname);

// logger.log(Level.INFO, "newAUNameList:\n{0}", xstream.toXML(newAUNameList)); String verb = "/edit-media/study";

    String alias = "/hdl:1902.29/11514";// hdl:1902.29/11514

    if (local) {
        alias = "/hdl:TEST/10000";// dtdv  tdvn2  ddt  hdl:TEST/10000
    }
    // dvntest: ddt
    // hdl:1902.29/11514
    // hdl:1902.29/11512
    String verbAndAlias = verb + alias;

    String portNumber = "443";// 8181  443
    if (local) {
        portNumber = "8181";
    }
    String protocol = "https";

    String hostUrl = hostname + ":" + portNumber;
    logger.log(Level.INFO, "hostUrl={0}", hostUrl);

    String requestUrl = protocol + "://"
        + user + ":" + password + "@"
        + hostUrl
        + REQUEST_ROOT + verbAndAlias;

    String zipFileName = "dvn-sample-files_5.zip";
    String mimeTypeTokenZip = "application/zip";
    CloseableHttpClient httpclient = null;
    CloseableHttpResponse resp = null;
    HttpEntity entity = null;
    String failedStatus;
    try {

        CredentialsProvider credsProvider = new BasicCredentialsProvider();
        credsProvider.setCredentials(new AuthScope(hostname,
            Integer.parseInt(portNumber)),
            new UsernamePasswordCredentials(
                user, password));

        httpclient = getCloseableHttpClient(credsProvider);

        logger.log(Level.INFO, "zip-upload case: requestUrl={0}", requestUrl);
        try {

            HttpPost httppost = new HttpPost(requestUrl);

            File zip = new File(zipFileName);
            if (!zip.exists()) {
                logger.log(Level.SEVERE, "zip file ({0}) was not found", zipFileName);
                throw new FileNotFoundException();
            } else {
                logger.log(Level.INFO, "zip file ({0}) exists", zipFileName);
            }

            FileEntity reqEntity = new FileEntity(zip, ContentType.create(mimeTypeTokenZip));

            httppost.setEntity(reqEntity);
            logger.log(Level.INFO, "executing request={0}",
                httppost.getRequestLine());
            httppost.addHeader("Content-Type", "application/zip");
            httppost.addHeader("Content-Disposition", "filename= " + zipFileName);
            httppost.addHeader("Packaging", "http://purl.org/net/sword/package/SimpleZip");

            resp = httpclient.execute(httppost);

            int statusCode = resp.getStatusLine().getStatusCode();

            logger.log(Level.INFO, "statusCode={0}", statusCode);
            if (statusCode != HttpStatus.SC_CREATED) {
                logger.log(Level.WARNING,
                    "response to http request is not OK: abort the request: status code={0}",
                    statusCode);
                if (statusCode == HttpStatus.SC_UNAUTHORIZED) {
                    logger.log(Level.SEVERE, "This box ({0}) may not have created the user account", hostname);
                    failedStatus = "authentication failure";
                } else {
                    failedStatus = "HttpStatusCode1=" + statusCode;

                }
                httppost.abort();
                return;
            }

            entity = resp.getEntity();
            String response = EntityUtils.toString(entity);
            logger.log(Level.INFO, "response={0}", xstream.toXML(response));

            logger.log(Level.INFO, "response={0}", response);

            Builder parser = new Builder();

            Document doc = parser.build(new StringReader(response));

            Serializer serializer = new Serializer(System.out, "UTF-8");
            serializer.setIndent(4);
            serializer.write(doc);
            serializer.flush();

            logger.log(Level.INFO, "finishing the http/https request ");

        } catch (ParsingException ex) {
            logger.log(Level.SEVERE, "ParsingException", ex);
        } finally {
            if (resp != null) {
                resp.close();
            }
        }

    } catch (SSLPeerUnverifiedException ex) {
        logger.log(Level.SEVERE, "SSLPeerUnverifiedException", ex);
        ex.printStackTrace();

    } catch (IOException ex) {
        logger.log(Level.SEVERE, "IOException", ex);
        ex.printStackTrace();

    } finally {
        if (httpclient != null) {
            try {
                httpclient.close();
            } catch (IOException ex) {
                logger.log(Level.SEVERE, null, ex);
            }
        }
    }

}
akio-sone commented 10 years ago

The following is a quick-fix solution:

        CredentialsProvider credsProvider
                = new BasicCredentialsProvider();

        credsProvider.setCredentials(new AuthScope(dataverseAccount.getHost(),
                dataverseAccount.getPort()),
                new UsernamePasswordCredentials(
                        dataverseAccount.getUserName(), dataverseAccount.getPassword()));

        httpclient = getCloseableHttpClient(credsProvider);

static CloseableHttpClient getCloseableHttpClient(CredentialsProvider credsProvider) {
    CloseableHttpClient httpclient = null;
    try {

        SSLContext sslcontext = SSLContexts.custom()
                .loadTrustMaterial(
                        null, new TrustStrategy() {
                            public boolean isTrusted(X509Certificate[] chain,
                                    String authType)
                            throws CertificateException {
                                return true;
                            }
                        }
                )
                .build();

        SSLConnectionSocketFactory sslsf
                = new SSLConnectionSocketFactory(sslcontext,
                        SSLConnectionSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER);
        httpclient = HttpClients.custom()
                .setSSLSocketFactory(sslsf)
                .setUserAgent(USER_AGENT)
                .setDefaultCredentialsProvider(credsProvider)
                .build();

        return httpclient;

    } catch (KeyStoreException ex) {
        logger.log(Level.SEVERE, "KeyStoreException", ex);

    } catch (NoSuchAlgorithmException ex) {
        logger.log(Level.SEVERE, "NoSuchAlgorithmException", ex);

    } catch (KeyManagementException ex) {
        logger.log(Level.SEVERE, "KeyManagementException", ex);

    }
    return httpclient;
}