CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS specification. This repository hosts the source code.
CLARIAH FCS Corpora currently has initial basic support for corpora based on Blacklab Server (INT corpora) and MTAS (Nederlab). The following corpora are included:
CLARIAH FCS Corpora consists of a backend and a webinterface (aggregator). The backend is based on a CLARIN backend implementation, extended with many specificalities for the Dutch corpora. The aggregator is more or less a copy of the CLARIN aggregator.
Based on the fhe Korp fcs 2.0 reference endpoint implementation (https://github.com/clarin-eric/fcs-korp-endpoint), which in turn builds on https://svn.clarin.eu/FCSSimpleEndpoint/
Code of dependencies of this project:
Cf:
The backend communicates with Blacklab Server for the INT corpora (BlackLab Server documentation here). For Nederlab, the backend communicates not directly with https://meertensinstituut.github.io/mtas/, but with an intermediate layer, which restricts access to the corpus, but accepts the same MTAS queries. For more about MTAS, see also the GitHub repository.
The aggregator is simple web interface for federated search, developed by CLARIN. It is still alpha software. The lib
directory of this repository contains a version (not necessarily the latest) of the Aggregator.