-
When reading https://github.com/todogroup/opencodeofconduct/blob/gh-pages/index.md I found that the wording is much better.
Compare:
> Be welcoming: We strive to be a community that welcomes and…
-
Hi,
I would like to propose adding the **Salamandra** models to **lit-gpt**. Salamandra is a model based on **LLaMA** that has been trained from scratch using **35 European languages**, including m…
-
# Locale Format
Locale identifier consists of language subtag, script subtag, region subtag, and one or more variant subtags.
For example,
- `zh-CN` and `zh-TW` are language + region. They mean …
-
This was brought up in our discussion of [font exposure for people who use minority languages](https://www.w3.org/2024/09/font-i18n-privacy.html).
A proposal here is that the browser adopt the most…
-
**Background**
In the field of multilingual large models, especially for non-English corpora, there is often a problem of insufficient data quantity and poor quality. High-quality training data is cr…
-
### Pitch
Currently the new language selection dropdown tool (introduced in #18420) supports a limited number of languages. I suggest:
* Adding more languages. A more comprehensive list of languag…
-
```
SIL (summer institute of applied linguistics) does a lot of work with
100's of the world's minority languages. Dictionaries are developed but
often lack funding or buyer potential to make pub…
-
Busra Test B v8.800 (and Khmer Busra / Mondulkiri on which it is based) contain a space 0020 which is 2 to 3 times wider than the space character of any other common Khmer font.
Khmer is typed with…
-
The Unicode Standard code chart for Khmer has an annotation for 179D KHMER LETTER SHA: "used only for Pali/Sanskrit transliteration". The name SHA implies that it belongs to the first register.
The…
-
**Description**
We want all the opensource Tibetan word segmented data and save it in a standard format.
The format should be:
```
[
{
'source': 'བོད་ཀྱི་གླུ་གར་རོལ་དབྱངས་ལ་གཞི་རྩའི་ཐོག་ནས་དབྱ…