issues
search
OpenThaiGPT
/
openthaigpt-pretraining
Apache License 2.0
21
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix(data): resolve test error on data module
#239
new5558
closed
1 year ago
1
fix(all): github action test failed
#238
new5558
closed
1 year ago
0
fix(data): data dedup and decontaminate not working
#237
new5558
closed
1 year ago
0
feat(data): add PDPA blind data script
#236
new5558
closed
1 year ago
0
feat(model): Resume trainable lightning fabric pipeline
#235
MoosaTae
closed
1 year ago
0
perf(data): revamp deduplication parallelism pipeline
#234
new5558
closed
1 year ago
0
feat(model): Resume trainable lightning fabric pipeline and add LoRA
#233
MoosaTae
closed
1 year ago
3
Add files via upload
#232
phasinA1learn
closed
1 year ago
0
Add files via upload
#231
phasinA1learn
closed
1 year ago
0
feat(data): add merge jsonl code for oscar dataset
#230
new5558
closed
1 year ago
0
debug(model): Fix don't same device in eval_step
#229
boss-chanon
closed
1 year ago
0
fix(data): make current data pipeline compatible with the current data format
#228
new5558
closed
1 year ago
0
feat(data): Split jsonl data
#227
kriangkraitan
closed
1 year ago
0
feat(model): add spm and bpe mode in spm trainer
#226
boss-chanon
closed
1 year ago
0
chore(model): refactor hydra model
#225
new5558
closed
1 year ago
0
feat(data): add deduplication pipeline
#224
new5558
closed
1 year ago
0
refactor(data)/decontaminate dataset from validation set
#223
new5558
closed
1 year ago
0
fix(data): solve map internet dataset index problem
#222
new5558
closed
1 year ago
0
feat(data): Saving huggingface dataset
#221
kriangkraitan
closed
1 year ago
0
feat(data): clean&reformat pantip 3G dataset
#220
kriangkraitan
closed
1 year ago
0
[Super AI] Find SEO cleaning method
#219
ArthurMinovsky
opened
1 year ago
0
refactor(model): training pipeline to hydra
#218
boss-chanon
closed
1 year ago
0
[SIIT] Clean up My order data
#217
ArthurMinovsky
opened
1 year ago
0
[SIIT] Crawl alternative data
#216
ArthurMinovsky
opened
1 year ago
1
Check SET FInancial Dataset
#215
new5558
opened
1 year ago
1
Crawl MCOT Website
#214
new5558
opened
1 year ago
0
Crawl ThaiSubtitle Website
#213
new5558
closed
1 year ago
1
feat(model): get tokenizer infomation pipeline
#212
boss-chanon
closed
1 year ago
0
debug(model): llama merge tokenizer save error
#211
boss-chanon
closed
1 year ago
0
debug(model): debug anything from refactor get_dataset
#210
boss-chanon
closed
1 year ago
0
Initial pantip preprocessing code
#209
boat1603
closed
1 year ago
0
feat(data): add decontamination pipeline
#208
new5558
closed
1 year ago
0
feat(model): efficent tokenize
#207
boss-chanon
closed
1 year ago
0
refactor(model): don't use chunk in dataset tokenized
#206
boss-chanon
closed
1 year ago
2
debug(model): load_dataset to load_from_dist
#205
boss-chanon
closed
1 year ago
0
debug(model): add optimizer in yaml file
#204
boss-chanon
closed
1 year ago
0
refactor(model): get dataset
#203
boss-chanon
closed
1 year ago
0
Refactor model loader
#202
boat1603
closed
1 year ago
0
refactor(model): make load dataset function
#201
boss-chanon
closed
1 year ago
0
refactor(model): ```get_dataset``` from disk and efficent tokenizer
#200
boss-chanon
closed
1 year ago
0
feat(Model): Refactor fabric to use hydra config
#199
MoosaTae
closed
1 year ago
0
debug(model): convert AutoTokenizer to PreTrainedTokenizerFast
#198
boss-chanon
closed
1 year ago
0
feat(model): don't need load merges file for merge tokenizer
#197
boss-chanon
closed
1 year ago
0
feat(model): load hf model to local path
#196
boss-chanon
closed
1 year ago
0
[SIIT] Research Multilingual Model
#195
new5558
opened
1 year ago
1
Translated dataset
#194
Pattptr
opened
1 year ago
0
feat(model): add ```train_extremely_large_corpus``` option to ```spm_train```
#193
boss-chanon
closed
1 year ago
0
refactor(model): gptj merge tokenizer
#192
boss-chanon
closed
1 year ago
1
feat(model): llama load tokenizer
#191
boss-chanon
closed
1 year ago
2
feat(model): bpe train pipeline
#190
boss-chanon
closed
1 year ago
0
Previous
Next