The DocumentStorage module performs the upload of documents to the DKT platform. When one or more documents are uploaded to the HTTP endpoint, they will be stored in the file system and in the MySQL database. The MySQL database stores metadata about each document, e.g. a state that marks if the document has been processed, if errors occurred during processing, and other states. A number of worker threads uses this database table as a processing queue and processes the documents one after another. Each document is converted to NIF and then send to a pipeline. The processing pipeline is configurable and executes a number of e-Services one after another. The results are then stored in the triple store. This image shows this process:
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections/{collection-name} HTTP method: POST Parameters:
CURL example:
curl -X POST "http://dev.digitale-kuratierung.de/api/document-storage/collections/my-collection"
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections/{collection-name}/documents HTTP method: POST Parameters:
CURL examples:
curl -X POST -d '<p>Welcome to Berlin!</p>' "https://dev.digitale-kuratierung.de/api/document-storage/collections/my-collection/documents?fileName=my-file.html"
curl -X POST -H "Content-Type: application/zip" "https://dev.digitale-kuratierung.de/api/document-storage/collections/my-collection/documents?fileName=file2.zip"
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections/{collection-name}/documents Request method: GET Parameters:
curl -X GET "https://dev.digitale-kuratierung.de/api/document-storage/collections/my-collection/documents"
Example Output:
[
{
"id": 1,
"filename": "00.xhtml",
"path": "00.xhtml",
"status": "PROCESSED",
"errorMessage": null,
"documentUri": "http://digitale-kuratierung.de/ns/00.xhtml#char=0,11",
"uploadTime": 1471529514000,
"lastUpdate": 1471529516000,
"collection": {
"name": "my-collection",
"documents": [],
"creationTime": 1471529514000
}
},
{
"id": 2,
"filename": "01.xhtml",
"path": "01.xhtml",
"status": "ERROR",
"errorMessage": "{\n \"exception\": \"eu.freme.common.exception.ExternalServiceFailedException\",\n \"path\": \"/e-sesame/storeData\",\n \"message\": \"SAIL is already locked by: 6242@v35731.1blu.de in /opt/storage/sesameStorage/my-collection\",\n \"error\": \"Bad Gateway\",\n \"status\": 502,\n \"timestamp\": 1471529619168\n}",
"documentUri": "http://digitale-kuratierung.de/ns/01.xhtml#char=0,11",
"uploadTime": 1471529514000,
"lastUpdate": 1471529514000,
"collection": {
"name": "my-collection",
"documents": [],
"creationTime": 1471529514000
}
}
]
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections/{collection-name}/status Request method: GET Parameters:
curl -X GET "https://dev.digitale-kuratierung.de/api/document-storage/collections/my-collection/status"
Example Output:
{
"counts": {
"PROCESSED": 14,
"ERROR": 2,
"CURRENTLY_PROCESSING": 0,
"NOT_PROCESSED": 0
},
"finished": true
}
Delete a collection. This will delete the collection from the database, delete all its file from the server and also delete the data from the triple store.
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections/{collection-name} Request method: DELETE Parameters:
API endpoint: https://dev.digitale-kuratierung.de/api/document-storage/collections Request method: GET Parameters: none
curl -X GET "https://dev.digitale-kuratierung.de/api/document-storage/collections"
Example Output:
["mendelsohn-archive", "example-collection"]
pipeline.json
The system will look for a file pipeline.json
in the classpath. This pipeline defines the enrichment pipeline that will be used to process all documents after they have been uploaded to a document collection. The pipeline can be parametrized using the parameters:
dkt.storage.pipeline.base-url
(see below).dkt.storage.pipeline.base-url
This is a standard Java configuration parameter which can be set e.g. in the application.properties
file. It changes the $base-url$ parameter of the pipelines. The parameter is optional, if it is not configured, $base-url$ will be configured to http://localhost:xy
, with xy being the port the server listens on, e.g. http://localhost:8080
.
dkt.storage.data-dir
This is a standard Java configuration parameter which can be set e.g. in the application.properties
file. It specifies the location which is used to store the upload files. The default value is "documents/".
dkt.storage.virtuoso-username
Specify the user name that has write access to the Virtuoso triple store.
dkt.storage.virtuoso-password
Specify the password of the user for write access to the Virtuoso triple store.
dkt.storage.virtuoso-crud-endpoint
Specify the API endpoint for write access to the Virtuoso triple store. E.g. http://example.com:8890/sparql-graph-crud-auth