TeDDi

This is the repository for the Text Data Diversity Sample (TeDDi Sample), a part of the Swiss National Science Foundation funded project: Non-randomness in Morphological Diversity: A Computational Approach Based on Multilingual Corpora.

This repository contains the corpus data and code that processes and analyzes it. This is currently a work in progress.

If you use TeDDi, please cite as:

Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Pelloni, and Tanja Samardzic. 2022. TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1150–1158, Marseille, France. European Language Resources Association. Online: https://aclanthology.org/2022.lrec-1.123/

To contribute code or data to the repository, please first refer to our guidelines on contributing.

Different data formats available for direct download.

Main Contributors (alphabetical order):

Bentz, Christian
Gutierrez-Vasques, Ximena
Moran, Steven
Samardžić, Tanja
Sozinova, Olga

Language-specific contributors (alphabetical order):

Kalessa, Jule (Paiwan)
Mächler, Alina
Rood, David S. (Wichita)
Roth, Rainer (Wari')

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

MorphDiv / TeDDi_sample

readme

TeDDi