ELTE-DH / NoSketch-Engine-Docker

A NoSketch Engine Docker image which is easy to use
GNU Lesser General Public License v3.0
16 stars 9 forks source link

NoSketch Engine Docker

This is a dockerised version of NoSketch Engine, the open source version of Sketch Engine corpus manager and text analysis software developed by Lexical Computing Limited.

This docker image is based on Debian 12 Bookworm and the NoSketch Engine build and installation process contains some additional hacks for convenient install and use. See Dockerfile for details.

TL;DR

  1. git clone https://github.com/ELTE-DH/NoSketch-Engine-Docker.git
  2. make pull – to download the docker image
  3. make compile – to compile sample corpora
  4. make execute – to execute a Sketch Engine command (compilecorp, corpquery, etc.) in the docker container (runs a test CLI query on susanne corpus by default)
  5. make run – to launch the docker container
  6. Navigate to http://localhost:10070/ to try the WebUI

Features

Further info on how to analyse a plain text corpus by e-magyar and convert it to the right format suitable to fit in the system.

Corpus configuration recipes to aid compilation of large corpora can be found here.

Usage

1. Get the Docker image

2. Compile your corpus

  1. Put vert file(s) in: corpora/CORPUS_NAME/vertical directory\ (see examples in corpora/susanne/vertical and corpora/emagyardemo/vertical directories)
  2. Put config in: corpora/registry/CORPUS_NAME file\ (see examples in corpora/registry/susanne and corpora/registry/emagyardemo)
  3. Compile all corpora listed in corpora/registry directory using the docker image: make compile
    • To compile one corpus at a time (overwriting existing files), use the following command: make execute CMD="compilecorp --no-ske --recompile-corpus CORPUS_REGISTRY_FILE"
    • If you want to overwrite all existing indices automatically when running make compile set any non-empty value for FORCE_RECOMPILE env variable e.g. make compile FORCE_RECOMPILE=y

3. Run

(Optional, only recommended if variables are altered)

Customise the environment variables in secrets/env.sh (see secrets/env.sh.template for example) and export them into the current shell with source secrets/env.sh

3a. Run the container

  1. Run docker container: make run
  2. Navigate to http://SERVER_NAME:10070/ to use

3b. CLI Usage

4. Additional commands

make parameters, multiple images and multiple containers

By default,

If there is a need to change these, set them as environment variables (e.g. export IMAGE_NAME=myimage:latest) or supplement make commands with the appropriate values (e.g. make run PORT=8080).

E.g. export IMAGE_NAME=myimage:latest; make build build an image called myimage:latest; and make run IMAGE_NAME=myimage:latest CONTAINER_NAME=mycontainer PORT=12345 launches the image called myimage:latest in a container called mycontainer which will use port 12345. In the latter case the system will be available at http://SERVER_NAME:12345/.

See the table below on which make command accepts which parameter:

command IMAGE_NAME CONTAINER_NAME CORPORA_DIR PORT FORCE_RECOMPILE USERNAME PASSWORD The Other Variables
make pull . . . . . . .
make build . . . . . . .
make compile . . . . . .
make execute . . . .
make run . . .
make connect . . . . . . .
make stop . . . . . . .
make clean . . . . .
make create-cert . . . . . . . .
make remove-cert . . . . . . . .
make htpasswd . . . . .

In the rare case of multiple different docker images, be sure to name them differently (by using IMAGE_NAME).\ In the more common case of multiple different docker containers running simultaneously, be sure to name them differently (by using CONTAINER_NAME) and also be sure to use different port for each of them (by using PORT). To handle multiple different sets of corpora be sure to set the directory containing the corpora (CORPORA_DIR) accordingly for each container.

If you want to build your own docker image be sure to include the IMAGE_NAME parameter into the build command: make build IMAGE_NAME=myimage:latest and also provide IMAGE_NAME=myimage:latest for every make command which accepts this parameter.

A convenient solution for managing many environment variables in an easy and reproducible way (e.g. for docker-compose.yml) is to customise and source secrets/env.sh (based on secrets/env.sh.template) before running the actual command: source secrets/env.sh; docker-compose up -d or source secrets/env.sh; make run. See secrets/env.sh.template for example configuration.

Authentication

Two types of authentication is supported: basic auth and Shibboleth

Basic auth

  1. Copy and uncomment relevant config lines from secrets/htaccess.template into secrets/htaccess and set username and password in secrets/htpasswd (e.g. use make htpasswd USERNAME="USERNAME" PASSWORD="PASSWD" >> secrets/htpasswd shortcut for running htpasswd from apache2-utils package inside docker)
  2. Run or restart the container to apply or (re)build your custom image

Shibboleth

To be able to use the container as a Shibboleth SP (with eduid.hu)

  1. Set the following environment variables:
    • SERVER_NAME e.g. export SERVER_NAME="https://sketchengine.company.com/"
    • SERVER_ALIASe.g. export SERVER_ALIAS="sketchengine.company.com"
  2. Obtain a self-signed certificate:
    • make create-cert to create a new certificate
    • Or put your files to secrets/sp.for.eduid.service.hu-cert.crt and secrets/sp.for.eduid.service.hu-key.crt with appropriate permissions (chmod 644 secrets/sp.for.eduid.service.hu-cert.crt secrets/sp.for.eduid.service.hu-key.crt)
  3. Setup HTTPS
  4. Run or restart the container to apply or uncomment the relevant lines at the end of Dockerfile before (re)building your custom image
  5. Register your SP with your IdP

HTTPS with Let's Encrypt

  1. Set (export) the environment variables (or set them in secrets/env.sh based on secrets/env.sh.template and source secrets/env.sh):
    • CITATION_LINK e.g. export CITATION_LINK="https://github.com/elte-dh/NoSketch-Engine-Docker"
    • LETS_ENCRYPT_EMAIL e.g. export LETS_ENCRYPT_EMAIL="contact@company.com"
    • SERVER_NAME e.g. export SERVER_NAME="https://sketchengine.company.com/"
    • SERVER_ALIAS e.g. export SERVER_ALIAS="sketchengine.company.com"
    • (optional) IMAGE_NAME, PORT and CONTAINER_NAME
    • PRIVATE_KEY e.g. export PRIVATE_KEY="$(cat secrets/sp.for.eduid.service.hu-key.crt 2> /dev/null)" or set as empty if basic auth is used export PRIVATE_KEY=""
    • PUBLIC_KEY e.g. export PUBLIC_KEY="$(cat secrets/sp.for.eduid.service.hu-cert.crt 2> /dev/null)" or set as empty if basic auth is used export PUBLIC_KEY=""
    • HTACCESS e.g. export HTACCESS="$(cat secrets/htaccess 2> /dev/null)" or set as empty if Shibboleth is used export HTACCESS=""
    • HTPASSWD e.g. export HTPASSWD="$(cat secrets/htpasswd 2> /dev/null)" or set as empty if Shibboleth is used export HTPASSWD=""
  2. Run docker-compose up -d

Citation link

You can set a link to your publications which you require users to cite. Set CITATION_LINK e.g. export CITATION_LINK="https://LINK_GOES_HERE" or in secrets/env.sh (see secrets/env.sh.template for example).

The link is displayed in the lower-right corner of the main dashboard if any type of authentication is set.

Similar projects

License

The following files in this repository are from https://nlp.fi.muni.cz/trac/noske and have their own license:

The rest of the files are licensed under the Lesser GNU GPL version 3 or any later.