GrETEL-upload is an extension package for GrETEL that allows to upload your own corpus or dataset. The application will then automatically transform your corpus in an Alpino XML-treebank. After processing, the treebanks are searchable in GrETEL, and if you supply metadata, you can use these for filtering and analysis.
On top of a default LAMP installation (with PHP 7.*; PHP 8 is currently not working), the following packages are required:
GrETEL-upload also requires the following external programs to be installed:
/opt/Alpino/
. You can change the installation directory in the application/config/database_default.php
. It also need to be changed in alpino.sh
.sudo -H pip3 install corpus2alpino
. This requires Python 3.6+.It is also possible to install using pip:
pip install -r requirements.txt
Make sure to modify [config/common.php] (see below) to point to the install location of corpus2alpino.
You will have to provide configuration details in four files:
application/config/common.php
: Paths and other common settings.application/config/config.php
: CodeIgniter settings.application/config/database.php
: Settings for your database connection to both the relational database (e.g. MySQL) and the XML-database (basex).application/config/ldap.php
: Settings for LDAP authentication.An example configuration for each can be found in application/config/{NAME}_default.php
.
Update the apache config, to allow read-write access to gretel-upload (and gretel).
Create the mysql database gretel_upload
You can use the command php index.php migrate
in the source directory to create/migrate the database schema.
See docs/schema.png
for the current database schema (exported from phpMyAdmin).
Make sure the uploads
directory is writable for the user running the Apache daemon (usually www-data
). Also create a writable sessions
directory and refer to its absolute path in application/config/config.php
if using the default files
session driver.
Start both Alpino and BaseX as server instances by running the following two commands:
basexserver -S
./alpino.sh
Then, navigate to the installation directory in your web browser (e.g. localhost/gretel-upload/
) to start using GrETEL-upload.
For production servers, a cron job is required for processing uploaded treebanks. Schedule the following e.g. every 5 minutes:
/usr/bin/php {root}/index.php cron process
Currently, three formats are supported: LASSY-XML, CHAT and plain text (UTF-8 encoded). When you upload a set of texts (always in a zipped folder, possibly consisting of multiple directories), you can specify whether the text is already sentence- and/or word-tokenized. If not, the application will do this for you.
GrETEL-upload allows metadata annotation using the PaQu metadata format. This metadata will be converted to LASSY-XML during import.
The GrETEL-upload interface then allows you to select which facet you would want to use to filter the data in GrETEL. You can e.g. choose to display a metadata column called 'year' as a slider, dropdown list or set of checkboxes. You can also choose to hide certain columns.
GrETEL-upload is written in PHP and created with CodeIgniter 3.1.11. The application uses the following libraries:
application/libraries/Alpino.php
: Wrapper around Alpino's dependency parser and tokenisation scripts.application/libraries/BaseX.php
: BaseX PHP connector. Slightly modified to work in CodeIgniter.application/libraries/Format.php
: Helper to convert between various formats such as XML, JSON, CSV, etc. Part of CodeIgniter Rest Server (see below).application/libraries/Ldap.php
: Authentication via LDAP. Inspired by the LDAP Authentication library.application/libraries/REST_Controller.php
: CodeIgniter Rest Server, turns controllers into REST APIs.GrETEL-upload uses the following JavaScript libraries:
GrETEL-upload is created with Pure CSS.
GrETEL-upload uses the FamFamFam silk icon set.
GrETEL-upload has an API for retrieving data from the database:
The test suite is created using ci-phpunit-test.
This uses PHPUnit.
You can run the tests by navigating to the application/tests
directory and calling phpunit
.
A working version is available on http://gretel.hum.uu.nl.