-
**Description**
We want all the opensource Tibetan word segmented data and save it in a standard format.
The format should be:
```
[
{
'source': 'བོད་ཀྱི་གླུ་གར་རོལ་དབྱངས་ལ་གཞི་རྩའི་ཐོག་…
-
### Description:
We have several websites containing Tibetan literature data that need to be scraped to gather as much valuable information as possible for training our LLM. The task involves not only…
-
Description
The button labeled in Tibetan is not translated to English when the overall page language is set to Tibetan.
This inconsistency creates confusion for users.
Expected behavior: The button…
-
For the Tibetan tradition pages, the librarians worked on a spreadsheet that has the data we want to put online. The spreadsheet is not done yet but I thought we could start reviewing it from a techni…
-
# Objective
Develop scripts to efficiently scrape Tibetan news articles from multiple sources, starting with the Voice of Tibet (VOT) website, and store them in a structured format for training a mach…
-
as a important language of Asia, i think it is better Publii have Tibetan option.
-
### Description:
This project updates the packaging to support generating synthetic OCR training data for Tibetan script in both Tibetan Pecha and Modern Tibetan Book formats. Enhancements include up…
-
Hello,
(Related to https://github.com/latex3/babel/issues/250 but I can’t reopen this issue, so I moved messages here.)
I've noticed another problem , involving Tibetan word breaks especially wh…
-
### Description:
This project focuses on enhancing the existing package that generates synthetic page images for Tibetan Pecha and Modern Tibetan Book formats. The main tasks involve modifying the …
-
It is said "43.5在藏文中可以直接书写为 ༤༬。" but in fig 9, ༬ = 1.5 which make me think `༤༬` express 41.5...