GramEval-2020

A shared task on full morphology and dependency parsing of Russian texts

Дорожка по автоматическому морфологическому и синтаксическому анализу русских текстов

Codalab Link
Channel in Telegram: @grameval2020
Paper and slides by organizers

Collocated with Dialog 2020

The objective of the GramEval 2020 Shared Task is to process Russian texts starting from provided tokens to parts of speech (pos), grammatical features, lemmas, and labeled dependency trees. To encourage the multi-domain processing, five genres of Modern Russian are selected as test data: news, social media and electronic communication, wiki-texts, fiction, poetry; Middle Russian texts are used as the sixth test set. The data annotation follows the Universal Dependencies scheme. Unlike in many similar tasks, the collection of existing resources, the annotation of which is not perfectly harmonized, is provided for training, so the variability in annotations is a further source of difficulties. The main metric is the average accuracy of pos, features, and lemma tagging, and LAS.

GramEval 2020 — дорожка по оценке методов и технических решений для полного морфологического и синтаксического анализа текстов на русском языке. В 2020 году доминантой была выбрана жанровая репрезентативность текстового материала. Для оценки подходов к автоматическому анализу текста был подготовлен тестовый набор данных, охватывающий пять жанров современного языка: новости, сообщения из социальных сетей и электронную коммуникацую, энциклопедические статьи, художественную литературу, поэзию, а также исторические тексты 17 века.

GramEval Results 🏆

qbic - 0.91609 link [Anastasyev 2020](Exploring pretrained models for joint morpho-syntactic parsing of Russian)
ADVance - 0.90762 link Sorokin, Smurov, Kirianov 2020
lima - 0.87870 link Bocharov, de Chalendar 2020
vocative - 0.85198 link - tagger link - lemmatizer
baseline - 0.80377 link

Competition description

We invite you to participate in the GramEval-2020 shared task. During the shared task, participants build systems that identify:

Morphological characteristics of the word (part of speech and features),
Lemma of the word
Syntactic relations (dependencies)

A cumulative evaluation score is computed on all tokens taking into account:
- POS (part of speech) accuracy
- morphological features accuracy
- LAS accuracy (labeled attachment score for dependency relations)
- lemmatization accuracy

See the performance of publicly available tools for Russian - analysis by Igor Trofimov

All metrics are calculated by the evaluate.py script.

Motivation:

We believe that multi-level language structures need to be labeled together, otherwise errors in one tag level lead to errors in the following. Existing pipelines “tokenization - morphology - lemmatization - syntax” accumulate errors in each stage.

We welcome systems that perform equally well on Russian tests of different registers (including texts that differ in style, scope and genre, region, time of creation), register-specific words and constructions.

Objective:

We encourage participants to build systems that implement full morphological and syntactic annotation and lemmatization within the framework of Universal Dependencies (UD). It is allowed that your system would build upon the baseline pipeline or use components of other existing open decisions.

Data:

Training data include news, social networks, fiction and non-fiction, business, poetry, and historical texts of the 17th century. Data listed in data.md file include:

training data with full annotation - the resulting work of our team of annotators and existing UD treebanks
additional data with automatic ("dirty") annotation
additional materials such as frequency lists and models based on the third-party resources
development sets (open test data) for preliminary evaluation of the model

It is allowed to train on all the data (train + dev), but for the convenience of participants, the dev set is selected for the preliminary evaluation of the model. As data come from different sources, they differ in data size for different registers, available levels of annotation and annotation quality, and attested combinations of feature tags for particular parts of speech.

During the evaluation phase, submissions are evaluated against the closed test data, which include texts in many genres and from different sources in Russian.

Data format:

Universal Dependencies standard, in the CONLL-U format, see data.md. UD tagset for Russian is available here.

Baselines:

Morphology:

RnnMorph (winner of MorphoRuEval 2017)

Syntax:

Udpipe (baseline CONLL 2018)

See the baseline.

Paper Submission

Paper due - March 18
Статьи должны быть загружены на платформу link в раздел Основная сессия (Substantive session).
- Для организации рецензирования оргкомитет Диалога просит до 10 марта загрузить название статьи и ее краткое резюме в систему.

Поданные анонимизированные тексты проходят стандартную процедуру двойного слепого рецензирования конференции Диалог с участием организаторов GramEval2020. Место системы в рейтинге не является решающим критерием для рецензентов. Оценивается в первую очередь ясность изложения информации об использованных методах, архитектурах и данных, а также насколько статья интересна для широкой аудитории. Рекомендуем включить в статью разбор итоговых и промежуточных результатов вашей системы, дополнительные метрики оценки качества, а также обсуждение дискуссионных моментов с точки зрения общего развития отрасли. По результатам рецензирования будет принято решение о публикации статьи в основном сборнике (индексируется SCOPUS) или в качестве онлайн-публикации на сайте Диалога (такая судьба ждет сухие технические отчеты).

Important Dates:

February 1, 2020 - the release of gold and additional "dirty" training data obtained using automatic marking
February 15, 2020 - testing systems
February 23-24, 2020 - final submission
March 5, 2020 - announcement of the results

We are open to questions about data, metrics, and testing procedures. email: grameval2020@gmail.com telegram: t.me/grameval2020

Sincerely, Competition Committee

dialogue-evaluation / GramEval2020

readme