German-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
Resources and tools which can be used either off-the-shelf or with minor adjustments and which are currently maintained are primarily chosen for this list. It is deliberately biased in terms of usability and user-friendliness.
Community support is needed to keep this list up-to-date, pull requests and suggestions are welcome! See contributing guidelines.
Table of Contents
Text corpora
General-purpose
Historical
Specialized
Swiss German
Learner and Error Corpora
Word lists
Data acquisition
Lists of corpora
Generic resources
Frameworks
Treebanks
Deep learning models and transformers
Annotation
Standards
Linguistic processing
Preprocessing
Tokenization / Sentence boundary detection
Stemming
Lemmatization
Morphological analysis
Normalization
Phonology
POS-tagging
Syntactical parsing
Named Entity Recognition
Misc
Text generation
Industry/Applications
Evaluation
Semantic analysis
Datasets
Word embeddings and senses
Sentiment analysis datasets / polarity clues
Sentiment detection
GermEval
(category to improve)
Discourse
Summarization and Simplification
Psycholinguistics
Speech NLP
Machine Translation
(category to improve)
Parallel corpora
Large Language Models
Teaching resources and tutorials
More lists
German
General
Comparable lists
Larger institutional GitHub groups
Contributors
See the list of contributors.
License