issues
search
OpenThaiGPT
/
openthaigpt-pretraining
Apache License 2.0
21
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
feat(model): load hf dataset from local in spm_trainer
#189
boss-chanon
closed
1 year ago
0
refactor(model): refactor llama merge tokenizer
#188
boss-chanon
closed
1 year ago
0
Hotfix : Fix issue in data/scripts/main.py
#187
Chawak
closed
1 year ago
0
refactor(model): add script to train spm_tokenizer
#186
boss-chanon
closed
1 year ago
0
fix(model): fix bug don't found sentence in train tokenizer
#185
boss-chanon
closed
1 year ago
0
refactor(model): remove jsonl prepare and merge dataset
#184
MoosaTae
closed
1 year ago
0
Initial setup evaluation repo
#183
boat1603
closed
1 year ago
0
Complete e2e Evaluation pipeline
#182
ArthurMinovsky
opened
1 year ago
1
[Super AI] Create Script to search for testset leakage on Trainset
#181
ArthurMinovsky
opened
1 year ago
1
[SIIT] Convert MPT to support gradient checkpointing, LoRa
#180
ArthurMinovsky
opened
1 year ago
4
[Super AI] Complete training pipeline
#179
ArthurMinovsky
opened
1 year ago
0
[SIIT] Clean 2G pantip Data
#178
ArthurMinovsky
opened
1 year ago
0
[SIIT] Clean 3G pantip Data
#177
ArthurMinovsky
opened
1 year ago
0
Complete data construction with metadata (Bank)
#176
ArthurMinovsky
opened
1 year ago
0
[Super AI] Load English and Code data
#175
ArthurMinovsky
opened
1 year ago
0
[Super AI] Make pdpa blind data code
#174
ArthurMinovsky
closed
1 year ago
0
internet data pipeline
#173
Chawak
closed
1 year ago
0
feat(data): clean pantip 2G dataset
#172
boss-chanon
closed
1 year ago
0
feat(model): Add model hydra yaml
#171
boat1603
closed
1 year ago
0
feat(model): Add hydra config
#170
boat1603
closed
1 year ago
0
refactor(model): Refactor Lion optimizer experiment
#169
boat1603
closed
1 year ago
0
feat(model): Add universal trainer
#168
boat1603
closed
1 year ago
0
Update Dockerfile
#167
boat1603
closed
1 year ago
0
refactor(model): Refactor model repo
#166
boat1603
closed
1 year ago
0
Onet test data
#165
thanathasCh
closed
11 months ago
0
feat(model): Add Falcon model
#164
boat1603
closed
1 year ago
0
Deduplicate datasets
#163
nuchhub
closed
1 year ago
1
Add(data): add concatenation code for pantip data
#162
kriangkraitan
closed
1 year ago
1
Tokens of pretraining dataset
#161
boat1603
opened
1 year ago
1
[Super AI] Clean Pantip data
#160
ArthurMinovsky
closed
1 year ago
1
feat(model): add Data preparation for lightning fabric
#159
MoosaTae
closed
1 year ago
4
Setup repo dvc
#158
boat1603
closed
1 year ago
0
[Super AI] Prepare Lightning Fabric Pipeline for continue pretraining in Thai language
#157
new5558
closed
1 year ago
0
[Super AI] Research Prompt Template for Token Classification
#156
new5558
opened
1 year ago
0
[Super AI] Create Pipeline to Translate Dataset
#155
new5558
opened
1 year ago
0
Research LLM Prompt Creation Framework
#154
new5558
opened
1 year ago
0
Research LLM Eval Framework
#153
new5558
opened
1 year ago
0
[Super AI] Create Prompt Engineering Pipeline for Classification Dataset
#152
new5558
opened
1 year ago
1
[Super AI] Create Prompt Engineering Pipeline for Summarization Dataset
#151
new5558
opened
1 year ago
1
[Super AI] Create Prompt Engineering Pipeline For QA Dataset
#150
new5558
opened
1 year ago
1
Convert my order dataset to Text
#149
new5558
opened
1 year ago
0
[Super AI] Mine intrinsic tasks from English and Thai Internet dataset
#148
new5558
opened
1 year ago
0
[Super AI] Create deduplication code for Pantip comments
#147
new5558
opened
1 year ago
0
[Super AI] Build Main Pipeline for all data
#146
new5558
opened
1 year ago
2
[Super AI] Download all public dataset to ThaiSC
#145
new5558
opened
1 year ago
0
[Super AI] Build Pipeline Pantip Data
#144
new5558
closed
1 year ago
1
Research programming text data
#143
Chawak
closed
1 year ago
0
[SuperAI] Implement concatenation code for pantip data
#142
Chawak
opened
1 year ago
1
Clean thai name from text
#141
Chawak
opened
1 year ago
0
Crawl and clean PRD data
#140
Chawak
closed
1 year ago
3
Previous
Next