-
**Background**
In the field of multilingual large models, especially for non-English corpora, there is often a problem of insufficient data quantity and poor quality. High-quality training data is cr…
-
### Is there an existing request for this feature?
- [X] I have searched the existing issues
### Is your feature request related to a problem? Please describe.
If one subscribes for the newsl…
-
**How it works**
The language and script enumeration `language_script_code` is currently not complete. Some combinations of scripts and languages are missing
**Improvement suggestion**
* Sepa…
-
## Issue description
We currently have 146 setup hooks included in packages (that I can find):
```sh
$ rg -F "setupHook =" -l | wc -l
146
```
Combined with 12 (documented) setup hooks incl…
-
I have code that I want to read in ion data from a file, make a modification, and then write it back out to a file. The file will be source controlled in a version control system.
I want to include…
-
Hi,
do you plan to pretrain models for maverick in languages other than English?
Thanks.
-
### Bug description
Problem appears when I set default language not to English.
If I add next lines to config file:
`BABEL_DEFAULT_LOCALE = 'ru'
LANGUAGES = {
'ru': {'flag': 'ru', 'name': 'Ру…
-
Hi, for training a new language like arabic do we have to train from stage-1 or stage-2/stage-3? Also, how much data is needed for a good accuracy?
Appreciate any insights
-
I'd like to fine-tune using unlabelled data, i.e. a causal language modeling. For instance to adapt a model to a new domain or language.
Which parts of the training code need to be changed to use s…
-
**Describe the bug**
A game cannot store over 1.4MB of data in a link as it starts deleting the ending lines of code.
**Reproduction Steps**
1. Make a game with lots of rules to check and exceed …