ZemberekDotNet is the C#/.NET Port of Zemberek-NLP (Natural Language Processing tools for Turkish).
This library will be kept in sync with Zemberek-NLP and same module structure will be maintained in .NET platform using NuGet packages under seperate projects.
Module | Package Name | Description | Status |
---|---|---|---|
All | ZemberekDotNet.All | Wrapper Package that includes all the modules. | |
Core | ZemberekDotNet.Core | Special Collections, Hash functions and helpers. | |
Morphology | ZemberekDotNet.Morphology | Turkish morphological analysis, disambiguation and word generation. | |
Tokenization | ZemberekDotNet.Tokenization | Turkish Tokenization and sentence boundary detection. | |
Normalization | ZemberekDotNet.Normalization | Basic spell checker, word suggestion. Noisy text normalization. | |
NER | ZemberekDotNet.NER | Turkish Named Entity Recognition. | |
Classification | ZemberekDotNet.Classification | Text classification based on Java port of fastText project. | |
Language Identification | ZemberekDotNet.LangID | Fast identification of text language. | |
Language Modeling | ZemberekDotNet.LM | Provides a language model compression algorithm. | |
Applications | ZemberekDotNet.Apps | Console applications | Pending |
gRPC Server | ZemberekDotNet.GRPC | gRPC server for access from other languages. | Pending |
Examples | ZemberekDotNet.Examples | Usage examples. | Pending |
Packages are targeting .NET Standart 2.1 Framework so that it can be used within .Net Core and .Net Framework projects. Examples/console applications will also be prepared with .Net Core aiming that the whole library can be used cross platform.
Repository is configured to continuously trigger a build, test and release cycle using Azure DevOps. At the end of a successful release, it automatically publishes the artifacts to NuGet.org.