This is a dockerised version of NoSketch Engine, the open source version of Sketch Engine corpus manager and text analysis software developed by Lexical Computing Limited.
This docker image is based on Debian 12 Bookworm and the NoSketch Engine build and installation process contains some additional hacks for convenient install and use. See Dockerfile for details.
git clone https://github.com/ELTE-DH/NoSketch-Engine-Docker.git
make pull
– to download the docker imagemake compile
– to compile sample corporamake execute
– to execute a Sketch Engine command (compilecorp
, corpquery
, etc.) in the docker container
(runs a test CLI query on susanne
corpus by default)make run
– to launch the docker containerhttp://localhost:10070/
to try the WebUIsusanne
(original NoSkE sample corpus)
and emagyardemo
Further info on how to analyse a plain text corpus by e-magyar and convert it to the right format suitable to fit in the system.
Corpus configuration recipes to aid compilation of large corpora can be found here.
make pull
(or docker pull eltedh/nosketch-engine:latest
)make build IMAGE_NAME=myimage:latest
– be sure
to name your image using the IMAGE_NAME
parametercorpora/CORPUS_NAME/vertical
directory\
(see examples in corpora/susanne/vertical
and corpora/emagyardemo/vertical
directories)corpora/registry/CORPUS_NAME
file\
(see examples in corpora/registry/susanne
and corpora/registry/emagyardemo
)corpora/registry
directory using the docker image: make compile
make execute CMD="compilecorp --no-ske --recompile-corpus CORPUS_REGISTRY_FILE"
make compile
set any non-empty value
for FORCE_RECOMPILE
env variable e.g. make compile FORCE_RECOMPILE=y
(Optional, only recommended if variables are altered)
Customise the environment variables in secrets/env.sh
(see secrets/env.sh.template
for example) and export them into the current shell with source secrets/env.sh
make run
http://SERVER_NAME:10070/
to usemake execute
: runs NoSketch Engine CLI commands using the docker image. Specify the command to run in the CMD
parameter.
For example:
make execute CMD='corpinfo -s susanne'
\
gives info about the susanne corpusmake execute CMD='corpquery emagyardemo "[lemma=\"és\"]"'
\
runs the specified query on the emagyardemo corpus and gives 2 hits.\
Mind the use of quotation marks: \"
inside "
inside '
.make connect
: gives a shell to a running containermake stop
: stops the containermake clean
: stops the container, removes indices for all corpora and deletes docker image – use with caution!make create-cert
: create self-signed certificate for Shibboleth (must restart a container to apply)make remove-cert
: delete self-signed certificate files (must restart a container to apply)make htpasswd
: generate strong password for htaccess authentication (must restart a container to apply; see details
in Basic auth section)make
parameters, multiple images and multiple containersBy default,
IMAGE_NAME
) is eltedh/nosketch-engine:latest
,CONTAINTER_NAME
) is noske
,CORPORA_DIR
) is $(pwd)/corpora
,PORT
) is 10070
,FORCE_RECOMPILE
) is not set
(empty or not set means false any other non-zero length value means true),CITATION_LINK
) is https://github.com/elte-dh/NoSketch-Engine-Docker
,SERVER_NAME
) is https://sketchengine.company.com/
(mandatory for docker-compose.yml
),SERVER_ALIAS
) is sketchengine.company.com
(mandatory for docker-compose.yml
),LETS_ENCRYPT_EMAIL
) is not set (mandatory for Let's Encrypt and
docker-compose.yml
),PUBLIC_KEY
, PRIVATE_KEY
) are loaded from
(secrets/sp.for.eduid.service.hu-{cert,key}.crt) or empty if these files do not exist
(mandatory for docker-compose.yml
),HTACCESS
, HTPASSWD
) are loaded from (secrets/{htaccess,htpasswd}
see secrets/{htaccess.template,htpasswd.template} for example) or empty if these files do not exist
(mandatory for docker-compose.yml
).If there is a need to change these, set them as environment variables (e.g. export IMAGE_NAME=myimage:latest
)
or supplement make
commands with the appropriate values (e.g. make run PORT=8080
).
E.g. export IMAGE_NAME=myimage:latest; make build
build an image called myimage:latest
; and
make run IMAGE_NAME=myimage:latest CONTAINER_NAME=mycontainer PORT=12345
launches the image called myimage:latest
in a container
called mycontainer
which will use port 12345
.
In the latter case the system will be available at http://SERVER_NAME:12345/
.
See the table below on which make
command accepts which parameter:
command | IMAGE_NAME |
CONTAINER_NAME |
CORPORA_DIR |
PORT |
FORCE_RECOMPILE |
USERNAME |
PASSWORD |
The Other Variables |
---|---|---|---|---|---|---|---|---|
make pull |
✔ | . | . | . | . | . | . | . |
make build |
✔ | . | . | . | . | . | . | . |
make compile |
✔ | . | . | . | ✔ | . | . | . |
make execute |
✔ | . | ✔ | . | ✔ | . | . | ✔ |
make run |
✔ | ✔ | ✔ | ✔ | . | . | . | ✔ |
make connect |
. | ✔ | . | . | . | . | . | . |
make stop |
. | ✔ | . | . | . | . | . | . |
make clean |
✔ | ✔ | ✔ | . | . | . | . | . |
make create-cert |
. | . | . | . | . | . | . | . |
make remove-cert |
. | . | . | . | . | . | . | . |
make htpasswd |
✔ | . | . | . | . | ✔ | ✔ | . |
CITATION_LINK
SERVER_NAME
and SERVER_ALIAS
PUBLIC_KEY
and PRIVATE_KEY
HTACCESS
and HTPASSWD
LETS_ENCRYPT_EMAIL
variable is only used in docker-compose.yml
In the rare case of multiple different docker images, be sure to name them differently (by using IMAGE_NAME
).\
In the more common case of multiple different docker containers running simultaneously,
be sure to name them differently (by using CONTAINER_NAME
) and also be sure to use different port for each of them
(by using PORT
). To handle multiple different sets of corpora be sure to set the directory containing the corpora
(CORPORA_DIR
) accordingly for each container.
If you want to build your own docker image be sure to include the IMAGE_NAME
parameter into the build command:
make build IMAGE_NAME=myimage:latest
and also provide IMAGE_NAME=myimage:latest
for every make
command
which accepts this parameter.
A convenient solution for managing many environment variables in an easy and reproducible way
(e.g. for docker-compose.yml
) is to customise and source secrets/env.sh
(based on
secrets/env.sh.template
) before running the actual command:
source secrets/env.sh; docker-compose up -d
or source secrets/env.sh; make run
.
See secrets/env.sh.template
for example configuration.
Two types of authentication is supported: basic auth and Shibboleth
secrets/htaccess.template
into
secrets/htaccess
and set username and password in secrets/htpasswd
(e.g. use make htpasswd USERNAME="USERNAME" PASSWORD="PASSWD" >> secrets/htpasswd
shortcut
for running htpasswd
from apache2-utils
package inside docker)To be able to use the container as a Shibboleth SP (with eduid.hu)
SERVER_NAME
e.g. export SERVER_NAME="https://sketchengine.company.com/"
SERVER_ALIAS
e.g. export SERVER_ALIAS="sketchengine.company.com"
make create-cert
to create a new certificatesecrets/sp.for.eduid.service.hu-cert.crt
and secrets/sp.for.eduid.service.hu-key.crt
with
appropriate permissions (chmod 644 secrets/sp.for.eduid.service.hu-cert.crt secrets/sp.for.eduid.service.hu-key.crt
)Dockerfile
before (re)building your custom imageexport
) the environment variables (or set them in secrets/env.sh
based on
secrets/env.sh.template
and source secrets/env.sh
):
CITATION_LINK
e.g. export CITATION_LINK="https://github.com/elte-dh/NoSketch-Engine-Docker"
LETS_ENCRYPT_EMAIL
e.g. export LETS_ENCRYPT_EMAIL="contact@company.com"
SERVER_NAME
e.g. export SERVER_NAME="https://sketchengine.company.com/"
SERVER_ALIAS
e.g. export SERVER_ALIAS="sketchengine.company.com"
IMAGE_NAME
, PORT
and CONTAINER_NAME
PRIVATE_KEY
e.g. export PRIVATE_KEY="$(cat secrets/sp.for.eduid.service.hu-key.crt 2> /dev/null)"
or set as empty if basic auth is used export PRIVATE_KEY=""
PUBLIC_KEY
e.g. export PUBLIC_KEY="$(cat secrets/sp.for.eduid.service.hu-cert.crt 2> /dev/null)"
or set as empty if basic auth is used export PUBLIC_KEY=""
HTACCESS
e.g. export HTACCESS="$(cat secrets/htaccess 2> /dev/null)"
or set as empty if Shibboleth is used
export HTACCESS=""
HTPASSWD
e.g. export HTPASSWD="$(cat secrets/htpasswd 2> /dev/null)"
or set as empty if Shibboleth is used
export HTPASSWD=""
docker-compose up -d
You can set a link to your publications which you require users to cite.
Set CITATION_LINK
e.g. export CITATION_LINK="https://LINK_GOES_HERE"
or in secrets/env.sh
(see secrets/env.sh.template
for example).
The link is displayed in the lower-right corner of the main dashboard if any type of authentication is set.
The following files in this repository are from https://nlp.fi.muni.cz/trac/noske and have their own license:
noske_files/manatee-open-*.tar.gz
(GPLv2+)noske_files/bonito-open-*.tar.gz
(GPLv2+)noske_files/crystal-open-*.tar.gz
(GPLv3)noske_files/gdex-*.tar.gz
(GPLv3)data/corpora/susanne/vertical
and data/registry/susanne
The rest of the files are licensed under the Lesser GNU GPL version 3 or any later.