INL / clariah-fcs-endpoints

REST endpoints for CLARIAH Federated Content Search
GNU General Public License v3.0
1 stars 0 forks source link
clariah corpus fcs

CLARIAH Federated content search corpora

CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS specification. This repository hosts the source code.

Using CLARIAH FCS corpora

Corpora

CLARIAH FCS Corpora currently has initial basic support for corpora based on Blacklab Server (INT corpora) and MTAS (Nederlab). The following corpora are included:

Architecture

CLARIAH FCS Corpora consists of a backend and a webinterface (aggregator). The backend is based on a CLARIN backend implementation, extended with many specificalities for the Dutch corpora. The aggregator is more or less a copy of the CLARIN aggregator.

Backend

Based on the fhe Korp fcs 2.0 reference endpoint implementation (https://github.com/clarin-eric/fcs-korp-endpoint), which in turn builds on https://svn.clarin.eu/FCSSimpleEndpoint/

Code of dependencies of this project:

Cf:

The backend communicates with Blacklab Server for the INT corpora (BlackLab Server documentation here). For Nederlab, the backend communicates not directly with https://meertensinstituut.github.io/mtas/, but with an intermediate layer, which restricts access to the corpus, but accepts the same MTAS queries. For more about MTAS, see also the GitHub repository.

Aggregator

The aggregator is simple web interface for federated search, developed by CLARIN. It is still alpha software. The lib directory of this repository contains a version (not necessarily the latest) of the Aggregator.