-
**Description**
We want all the opensource Tibetan word segmented data and save it in a standard format.
The format should be:
```
[
{
'source': 'བོད་ཀྱི་གླུ་གར་རོལ་དབྱངས་ལ་གཞི་རྩའི་ཐོག་ནས་དབྱ…
-
We just changed arg1 to arg2 for _yod_. See https://github.com/tibetan-nlp/lim-annodoc/issues/18.
I think this move may also be advisable for _byung_, for the same reasons. Are there any other verb…
heacu updated
6 years ago
-
I checked through the repo, and it seems that there is no documentation. Did I miss something? Perhaps you could provide some simple use case examples on the README so there is an idea of the kind of …
-
### Description
The goal is to develop a Tibetan text-to-speech (TTS) model that can convert Tibetan text into Tibetan speech. This project involves training a TTS model using filtered good audio qual…
-
# RFC0068: Using Botok for Analyzing OCR Text Quality in Openpecha-data
## Named Concepts
- OCR: Optical Character Recognition, a technology to convert scanned or printed text into machine-encoded…
-
I have many cases where two tokens such as བྱང་ཆུབ་ and སེམས་དཔ become a single thing in the scatterplot. Is this something that ScatterText is doing? The tokenizer I'm using does not do that.
-
Por la tibeta lingvo kaj supozeble aliaj aziaj lingvoj laŭsilaba ordigo estas bezonata anstataŭ laŭlitera:
Konkrete en la tibeta:
- Vorto konsistas el unu aŭ pluraj silaboj
- Silaboj estas apart…
-
## Work Planning
Details
Table of Contents
- [Housekeeping](#housekeeping)
- [Named Concepts](#named-concepts)
- [Summary](#summary)
- [Reference-Level Explanation](#reference-level-explanation)…
-
Click here for Docs
Table of Contents
- [Housekeeping](#housekeeping)
- [Named Concepts](#named-concepts)
- [Summary](#summary)
- [Reference-Level Explanation](#reference-level-explanation)
- [Alt…
-
### For Bug Reports
* BookStack Version: v0.20.0
When the word I'm looking for is the first word, or there's a space in front of it, it's ok.
![i01](https://user-images.githubusercontent.com/30…