LanguageMachines / ticcltools

Tools for TICCL
GNU General Public License v3.0
14 stars 3 forks source link

Project Status: Unsupported – The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

TICCLTOOLS

TICCLtools is a collection of programs to process text data files towards fully-automatic lexical corpus post-correction. Together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software is usually invoked by the pipeline system PICCL: https://github.com/LanguageMachines/PICCL , consult there for installation and usage instructions unless you really want to invoke the individual tools manually.

The workflows in PICCL, the Philosophical Integrator of Computational and Corpus Libraries are schematically visualised here, TICCL being the one to the right:

PICCL Architecture

Preparation for a specific language and its alphabet:

Note: A fairly wide range of language specific alphabet and character confusion files are available online, precluding the need for performing this preparatory step yourself.

We have prepared TICCL for work in many languages, mainly on the basis of available open source lexicons due to Aspell. The language specific files are available here:

Unpack in your main TICCL directory. A subdirectory data/int/ will be created to house the required files for the specific language(s).

Should you want or need to build your own TICCL alphabet and character confusion files yourself, the tool to do that is:

The actual TICCL post-correction programs in this collection are:

Manual Installation

We provide containers for simple installation, see the next section. If you want to build and install manually on a Linux/BSD system instead, follow these instructions:

First ensure the following dependencies are installed on your system:

First git clone this repository, enter its directory and build as follows:

$ sudo ./build-deps.sh && ./bootstrap.sh && ./configure && make && sudo make install

If you have no root permissions, set environment variable PREFIX to the target directory where you want to install (ensure it exists), the one in the following example is a sane default:

$ export PREFIX="$HOME/.local/"
$ ./build-deps.sh && ./bootstrap.sh && ./configure --prefix "$PREFIX" && make && make install

Adjust your environment accordingly so the binary and libraries in $PREFIX can be found: On Linux, ensure the value of $PREFIX/lib is added to your $LD_LIBRARY_PATH and $PREFIX/bin directory to your $PATH.

Container Usage

A pre-made container image can be obtained from Docker Hub as follows:

docker pull proycon/ticcltools

You can build a docker container as follows, make sure you are in the root of this repository:

docker build -t proycon/ticcltools .

This builds the latest stable release, if you want to use the latest development version from the git repository instead, do:

docker build -t proycon/ticcltools --build-arg VERSION=development .

Run the container interactively as follows:

docker run -t -i proycon/ticcltools

Or invoke the tool you want:

docker run proycon/ticcltools TICCL-rank

Add the -v /path/to/your/data:/data parameter (before -t) if you want to mount your data volume into the container at /data.