clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
51 stars 53 forks source link
corpus parliamentary-data tei-xml

ParlaMint: Comparable Parliamentary Corpora

The CLARIN ParlaMint project is compiling comparable parliamentary corpora for a number of countries and languages.

ParlaMint corpora are interoperable, i.e. encoded to a very constrained common ParlaMint schema, a specialisation of the Parla-CLARIN recommendations, which are a customisation of the TEI Guidelines. Common scripts should process the common data in any ParlaMint corpus, despite the differing parliamentary systems of the countries, the kind of information included in the corpora, and, of course, language.

The latest version of ParlaMint is 4.1 which contains corpora for 29 countries and autonomous regions in original languages as well as machine translated to English, and is available from the CLARIN.SI repository:

Publications connected to ParlaMint are available at the ParlaMint project page.

The two most comprehensive publication on ParlaMint corpora are the LREV preprint describing version 4.1 and the LREV publication describing version 2.1:


This Git repository contains the ParlaMint XML schemas, the scripts used to validate and convert the ParlaMint TEI XML corpora to some useful derived formats, and samples of the ParlaMint corpora. Note that there are several branches for different parts of the development.