ICOS-Carbon-Portal / meta

ICOS Carbon Portal metadata service
GNU General Public License v3.0
3 stars 1 forks source link

ICOS Carbon Portal metadata service

Metadata service for hosting, mantaining and querying information about things like ICOS stations, people, instruments, archived data objects, etc. It is deployed to https://meta.icos-cp.eu/ with different services accessible via different paths:


Upload instructions (manual)

Manual uploads of data/document objects and collection creation can be performed using UploadGUI web app. Users need permissions and prior design of data object specifications in collaboration with the CP. Metadata of existing objects and collections can be updated later, using the same app.


Upload instructions (scripting)

This section describes the complete, general 2-step workflow for registering and uploading a data object to the Carbon Portal for archival, PID minting and possibly for being served by various data services.

Authentication

Before you begin, make sure with the Carbon Portal's (CP) technical staff that the service is configured to accept your kind of data objects, and that there is a user account associated with the uploads you are going to make. Log in to CPauth with this account. You will be redirected to a page showing, among other things, your API token. This token is what your software must use to authenticate itself against CP services. It has validity period of 100000 seconds (about 27.8 hours).

Alternatively, the authentication token can be fetched in an automation-friendly way by HTTP-POSTing the username and password as HTML form fields mail and password to https://cpauth.icos-cp.eu/password/login. For example, using a popular command-line tool curl on Linux, it can be done as follows:

$ curl --cookie-jar cookies.txt --data "mail=user_email&password=user_password" https://cpauth.icos-cp.eu/password/login

(please note that both email and the password strings must be URL-encoded, at least when they contain special characters, such as e.g. +, $, &, or spaces; encoding can be done for example using encodeURIComponent() function of any Web browser's Javascript console)

The resulting cookies.txt file will then contain the authentication cookie token, which can be automatically resent during later requests. (Note for developers: the file must be edited if you want to use it for tests against localhost).

Naturally, instead of curl, one can automate this process (as well as all the next steps) using any other HTTP-capable tool or programming language.

Registering the metadata package

The first step of the 2-step upload workflow is preparing and uploading a metadata package for your data object. The package is a JSON document whose exact content depends on the kind of data object. There are two specific kinds that are recognized: station-specific time series data objects and spatiotemporal data objects (which may optionally also be station-specific). For the former kind, the metadata has the following format:

{
    "submitterId": "ATC",
    "hashSum": "7e14552660931a5bf16f86ad6984f15df9b13efb5b3663afc48c47a07e7739c6",
    "fileName": "L0test.csv",
    "specificInfo": {
        "station": "http://meta.icos-cp.eu/resources/stations/AS_SMR",
        "acquisitionInterval": {
            "start": "2008-09-01T00:00:00.000Z",
            "stop": "2008-12-31T23:59:59.999Z"
        },
        "instrument": "http://meta.icos-cp.eu/resources/instruments/ATC_181",
        "samplingHeight": 54.8,
        "production": {
            "creator": "http://meta.icos-cp.eu/resources/people/Lynn_Hazan",
            "contributors": [],
            "hostOrganization": "http://meta.icos-cp.eu/resources/organizations/ATC",
            "comment": "free text",
            "creationDate": "2017-12-01T12:00:00.000Z",
            "sources": ["utw3ah9Fo7_Sp7BN5i8z2vbK"],
            "documentation": "_Vb_c34v0nfTA_fG0kiIAmXM"
        }
    },
    "objectSpecification": "http://meta.icos-cp.eu/resources/cpmeta/atcCo2NrtDataObject",
    "isNextVersionOf": "MAp1ftC4mItuNXH3xmAe7jZk",
    "preExistingDoi": "10.1594/PANGAEA.865618",
    "references": {
        "keywords": ["CO2", "meteo"],
        "licence": "https://creativecommons.org/publicdomain/zero/1.0/",
        "moratorium": "2018-03-01T00:00:00Z",
        "duplicateFilenameAllowed": false
    }
}

For the spatiotemporal data objects, the metadata package has the same general structure, but specificInfo property differs, and should look as follows:

{
    "title": "JenaCarboScopeRegional inversion results for EUROCOM",
    "description": "JenaCarboScopeRegional inverse modelling estimates of European CO2 fluxes for 2006-2015 as part of the EUROCOM inversion...",
    "spatial": "http://meta.icos-cp.eu/resources/latlonboxes/europeLatLonBoxIngos",
    "temporal": {
        "interval": {
            "start": "2006-01-01T00:00:00Z",
            "stop": "2015-12-31T00:00:00Z"
        },
        "resolution": "monthly"
    },
    "production": {
        //same as for station-specific time series
    },
    "forStation": "http://meta.icos-cp.eu/resources/stations/AS_SMR",
    "samplingHeight": 50.5,
    "customLandingPage": "http://www.bgc-jena.mpg.de/CarboScope/?ID=s99_v3.7",
    "variables": ["co2flux_land", "co2flux_ocean"]
}

Clarifications:

In HTTP protocol terms, the metadata package upload is performed by HTTP-POSTing its contents to https://meta.icos-cp.eu/upload with application/json content type and the authentication cookie. For example, using curl (metaPackage.json and cookies.txt must be in the current directory), it can be done as follows:

$ curl --cookie cookies.txt -H "Content-Type: application/json" -X POST -d @metaPackage.json https://meta.icos-cp.eu/upload

Alternatively, the CPauth cookie can be supplied explicitly:

$ curl -H "Cookie: <cookie-assignment>" -H "Content-Type: application/json" -X POST -d @metaPackage.json https://meta.icos-cp.eu/upload

Uploading the data object

Uploading the data object itself is a simple step performed against the CP's Data service https://data.icos-cp.eu/. Proceed with the upload as instructed here

Uploading document objects

In addition to data objects who have properties as data level, data object specification, acquisition and production provenance, there is a use case for uploading supplementary materials like pdf documents with hardware specifications, methodology descriptions, policies and other reference information. To provide for this, CP supports upload of document objects. The upload procedure is completely analogous to data object uploads, the only difference being the absence of specificInfo and objectSpecification properties in the metadata package.

Creating a static collection

Carbon Portal supports creation of static collections with constant lists of immutable data objects or other static collections. The process of creating a static collection is similar to step 1 of data object upload. Here are the expected contents of the metadata package for it:

{
    "submitterId": "ATC",
    "title": "Test collection",
    "description": "Optional collection description",
    "members": ["https://meta.icos-cp.eu/objects/G6PjIjYC6Ka_nummSJ5lO8SV", "https://meta.icos-cp.eu/objects/sdfRNhhI5EN_BckuQQfGpdvE"],
    "isNextVersionOf": "CkSE78VzQ3bmHBtkMLt4ogJy",
    "preExistingDoi": "10.18160/VG28-H2QA",
    "documentation": "_Vb_c34v0nfTA_fG0kiIAmXM"
}

The fields are either self-explanatory, or have the same meaning as for the data object upload.

As with data object uploads, this metadata package must be HTTP-POSTed to https://meta.icos-cp.eu/upload with application/json content type and the CP authentication cookie. The server will reply with landing page of the collection. The last segment of the landing page's URL is collections ID that is obtained by SHA-256-hashsumming of the alphabetically sorted list of members' hashsums (it is base64url representations of the hashsums that are sorted, but it is binary values that contribute to the collections' hashsum).

Reconstructing upload-metadata packages of existing objects/collections

When scripting uploads of multiple objects, it can be convenient to use an upload-metadata package of an existing object as an example or a template. The reconstructed package can be fetched using the following request:

curl https://meta.icos-cp.eu/dtodownload?uri=<langing page URL>

In bash shell, one can also format the JSON after fetching, as in this example:

curl https://meta.icos-cp.eu/dtodownload?uri=https://meta.icos-cp.eu/objects/n7cB5kS4U1E5A3mXKtEUCF9s | python3 -m json.tool


Accessing the metadata

Carbon Portal stores its metadata in an RDF store (also called triplestore), where every metadata entity is represented with a URL. All of these URLs are resolvable and can be visited using Web browsers and other HTTP client software. Examples of the kinds of metadata entities include data objects, document objects, collections, organizations, people, research stations, dataset specifications, variables, acquisition/creation/submission provenance objects, etc.

Data objects

Carbon Portal's data objects have a well-defined separation between data (the binary content of the object, viewed as a constant sequence of bytes and identified using its SHA-256 hashsum) and metadata (all the other information about the object, which existed or could have existed at the time of object creation). Examples of data object metadata include file name, size in bytes, research station, sampling height, previous/next versions, etc.

The most basic and user-friendly way of accessing data object's metadata is visiting its landing page (example: https://meta.icos-cp.eu/objects/_fJ8Skpz_lvMnAOfsRApZojG) using a Web browser, and then possibly explore it further by navigating to the links therein. Additionally, every data object has a metadata view (see example) in the portal app.

Apart from metadata access methods intended for human consumption, CP offers a way of accessing data object metadata programmatically. All the metadata is published using CC0 licence, which means that no licence acceptance is needed, and all metadata access can be performed anonymously.

Programmatic access to individual data objects' metadata is performed by sending HTTP GET request to the landing page, specifying the desired metadata format using HTTP content negotiation. For example, it is possible to download most of the data object's metadata displayed on its landing page, as a single JSON object, using command-line tool curl like so:

curl -H "Accept: application/json" https://meta.icos-cp.eu/objects/qZzevJN69j6rRPKxTbJZckDf

Other supported content types are intended for fetching different serializations of RDF metadata: application/xml or application/rdf+xml for RDF/XML, and text/plain or text/turtle for RDF/Turtle.

Other metadata entities

Same principles and approaches to metadata access apply to document objects, collections, organizations, people, data types, variables, etc. However, the list of supported content types and richness of the corresponding metadata representations and HTML landing pages may vary.

SPARQL

CP metadata service responds to arbitrary queries in W3C-standardized query language SPARQL sent to its SPARQL endpoint https://meta.icos-cp.eu/sparql. Writing the queries requires familiarity with the query language and with CP metadata model. The latter is formally expressed using OWL — W3C-standard ontology language. We recognize that even for technical external users the threshold to writing SPARQL queries is rather high, and therefore invite them to get in touch with us, should they have metadata-query needs not covered by our user-friendly products.

To demonstrate some of the possibilities that are accessible via SPARQL, we refer to our (semi-) user-friendly SPARQL client app (has a list of pre-defined queries to choose from), and to a list of under-the-hood queries used in the portal app. (Note that the panel-heading of the Search result contains a small button redirecting to the search query for the data object list.)


Metadata flow (for ICOS ATC and ICOS Cities mid- and low cost sernsor networks)

Authentication with a pre-configured data portal account is required. The authentication mechanism is the same as for data object upload.

ATC

The CSV tables with ATC metadata are to be pushed as payloads of HTTP POST requests to URLs of the form

https://meta.icos-cp.eu/upload/atcmeta/<tableName>

where <tableName> is a name used to distinguish different tables, for example "roles", "stations", "instruments", "instrumentsLifecycle", etc.

ICOS Cities mid- and low cost sensor networks

The URL to POST metadata files to is of the form

https://citymeta.icos-cp.eu/upload/midLowCost<city>/<tableName>

where <city> is a city (e.g. Zurich, Paris, Munich) and <tableName> is the name of a metadata table (e.g. sites). For example, to upload with curl:

$ curl -X POST --data-binary "@zuerich_sites.csv" https://citymeta.icos-cp.eu/upload/midLowCostZurich/sites --cookie "cpauthToken=..."


Administrative API for RDF updates

Intended for internal use at Carbon Portal. All the updates need to go through the RDF logs, therefore SPARQL UPDATE protocol could not be used directly. Instead, one needs to HTTP POST a SPARQL CONSTRUCT query, that will produce the triples that need to be inserted/retracted, to a URL of the form:

https://meta.icos-cp.eu/admin/<insert | delete>/<instance-server id> ,

where instance-server id is the id of the instance server that will be affected by the change, as specified in meta's config file.

To be allowed to perform the operation, one needs to be a on the adminUsers list in the config (cpmeta.sparql.adminUsers). Here is a curl example of the API usage:

curl --upload-file sparql.rq -H "Cookie: cpauthToken=<the token>" https://meta.icos-cp.eu/admin/delete/sitescsv?dryRun=true

The output will show the resulting changes. If dryRun is true, no actual changes are performed, only the outcome is shown.


Information for developers

Getting started with the front-end part

Getting started with the back-end part

Setting up authentication/authorization for the Handle.net client HandleNetClient

Handle.net servers use two-way TLS.

Client side

$ openssl genpkey -algorithm RSA -out private_key.pem -pkeyopt rsa_keygen_bits:4096

$ openssl pkcs8 -topk8 -outform DER -in private_key.pem -out private_key.der -nocrypt

$ openssl rsa -pubout -in private_key.pem -outform DER -out public_key.der

$ openssl req -keyform DER -key private_key.der -new -x509 -days 15000 -out handleClientCert.pem

Server side

By default, Handle.net server software comes with self-signed SSL certificates with CN=anonymous. This does not work for Java, therefore it is necessary to get the administrators of the Handle server (which you are going to use) to replace the default with a self-signed certificate with a CN equal to the actual domain name of the server. After that the server certificate needs to be fetched (to be used later as a trusted cert), for example:

$ openssl s_client -showcerts -connect epic.pdc.kth.se:8000 < /dev/null 2> /dev/null | openssl x509 -outform PEM > server_cert.pem

Testing with curl

curl has the possibility of disabling server certificate validation with -k command-line option. The following example should create/overwrite handle with suffix <suffix> (use actual desired suffix) by HTTP-PUTing JSON file payload.json into a handle:

$ curl -v -k --cert handleClientCert.pem --key private_key.pem -H 'Authorization: Handle clientCert="true"' -H "Content-Type: application/json" --upload-file payload.json https://epic.pdc.kth.se:8000/api/handles/11676/<suffix>?overwrite=true

payload.json is expected to contain a JSON object with array of handle values as values property. For more details on the HTTP API see documentation. To examine handle values, run, for example

$ curl -k https://epic.pdc.kth.se:8000/api/handles/11676/<suffix> | python -m json.tool

Deployment

Miscellaneous recipes

Restoring RDFLog database from pg_dump

cat dump.sqlc | docker exec -i rdflogdb_rdflogdb_1 pg_restore -c -U postgres -d postgres --format=c > /dev/null

Autorendering README.md to HTML for preview on file change

Make sure that Python is available, and python-markdown and inotify-tools packages are installed on your Linux system. Then you can run:

$ while inotifywait -e close_write README.md; do python -m markdown README.md > README.html; done

SHA-256 sum in base64

$ sha256sum <filename> | awk '{print $1;}' | xxd -r -p | base64