The Setup is part of the Corporate Linked Data Catalog - short: COLID - application. Here you can find an introduction to the application. A description of all its functions is here.
The complete guide can be found at the following link.
This repository helps settings up a local environment based on Docker Compose.
Install Docker Desktop for Windows from Docker Hub (latest test with Docker Desktop v2.2.0.3)
Clone this repository locally
git clone --recursive [URL to this Git repo]
Pull all changes in all submodules
git pull --recurse-submodules
Create a file .env
in parallel to the file docker-compose.yml
and insert the following variables (example values are shown):
MESSAGEQUEUE_COOKIE=SWQOKODSQALRPCLNMEQG
MESSAGEQUEUE_USERNAME=guest
MESSAGEQUEUE_PASSWORD=guest
GRAPHDATABASE_USERNAME=admin
GRAPHDATABASE_PASSWORD=admin
RELATIONAL_DATABASE_ROOT_PASSWORD=dbadminpass
RELATIONAL_DATABASE_USERNAME=dbuser
RELATIONAL_DATABASE_PASSWORD=dbpass
SMTP_USERNAME=any
SMTP_PASSWORD=any
Run docker-compose up
to download and build all Docker images and startup the environment
Wait for docker-compose to start up
Open the COLID editor (see URL below). Go to the profile menu in the upper right corner and click on "Administration". Open the Metadata Graph Configuration sub-menu page and click the "Start reindex" button in the upper right corner.
If you just want to use the Knowledge Graph Explorer Application, then only fuseki, KGE-Frontend and KGE-Web-service docker images needs to be installed.
While building the frontend the following error could occur. In the Dockerfiles of the frontend applications node is used with an increased heap size while building the applications node --max_old_space_size=8000
. Try to increase this, if the error occurs.
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
After starting the application a second time, the fuseki database could throw exceptions. Delete the Docker container of the fuseki database with docker container rm fuseki
. ATTENTION: This will remove all your created data and reload the database with the initial data.
fuseki-loader/loader.sh could contain Carriage Return
characters, remove them.
Some quick tips and advices to work faster.
To purge all unused or dangling images, containers, volumes, and networks run the following command:
docker system prune -a
To remove all containers:
docker container rm $(docker container ls -aq)
Open http://localhost:5601, go to the Dev Tools in the left panel, enter and run the following commands
PUT dmp-resource-1970-01-01_00.00.00
PUT dmp-metadata-1970-01-01_00.00.00
{
"mappings": {
"enabled": false
}
}
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "dmp-resource-1970-01-01_00.00.00", "aliases" : ["dmp-search-resource", "dmp-update-resource"] } },
{ "add" : { "index" : "dmp-metadata-1970-01-01_00.00.00", "aliases" : ["dmp-search-metadata", "dmp-update-metadata"] } }
]
}
On the Semantic Web, URIs identify not just Web documents, but also real-world objects like people and cars, and even abstract ideas and non-existing things like a mythical unicorn. We call these real-world objects or things. COLID uses the native bayer.com as default domain in each of its URI as the project was conceived for Bayer Ag. For example - https://pid.bayer.com/kos/19050/hasLabel
However you can also configure the custom domain in the URI if needed. In order to do that before building the docker containers, all the triples in the triplestore as well as the references to the URIs should be updated to use the custom domain.
Multiple files references across the projects need to be changed from bayer.com to any custom specific domain - https://pid.orange.com/kos/19050/hasLabel
Details are mentioned below. |
File | Project | Variable | Comments |
---|---|---|---|---|
loader.sh | fuseki-staging | baseUrl | change baseUrl (example.com) as per your need in the shellscript before uploading triples | |
appsettings.json | AppData Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Indexing Crawler Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Registration Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Reporting Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Search Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Scheduler Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
appsettings.json | Resource Relationship Manager Backend Service | ServiceUrl, HttpServiceUrl |
change both variables as per your custom domain. "ServiceUrl": "https://pid.example.com/", "HttpServiceUrl": "http://pid.example.com/" |
|
environment.ts, environment.docker.ts | Editor Frontend | baseUrl, PidUriTemplate.baseUrl | change baseUrl (example.com) in both sections as per your custom domain | |
environment.ts, environment.docker.ts | Data Marketplace Frontend | baseUrl | change baseUrl (example.com) as per your custom domain | |
environment.ts, environment.docker.ts | Resource Relationship Manager Frontend | baseUrl | change baseUrl (example.com) as per your custom domain |
Carrot2 clustering service is an opensource for clustering text. It can automatically discover groups of related documents and label them with short key terms or phrases. Please publish few resources in your local COLID Setup and then you can view the clusters in the Data Marketplace. Refer link below for more details