issues
search
OpenThaiGPT
/
openthaigpt-pretraining
Apache License 2.0
21
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
scrape news data from SET website
#339
kakarem
opened
6 months ago
0
feat(model,eval) DPO training & human preference eval
#338
BobbyL2k
opened
8 months ago
1
Scrape Data from Krisdika for New UI
#337
boss-chanon
opened
10 months ago
1
FOMC scraping pipeline
#336
Pattptr
opened
11 months ago
2
Pipeline for Scrape King Data from Krisdika [LM-238]
#335
boss-chanon
closed
10 months ago
3
Pipeline for Scrape Polity Data from Krisdika [LM-237]
#334
boss-chanon
closed
10 months ago
3
Fix Flake8 and Black for set_annual
#333
boss-chanon
opened
11 months ago
0
Scrape Admincourt Pipeline [LM-244]
#332
boss-chanon
opened
11 months ago
2
Adding community contribution for inference pipeline
#331
kwankoravich
opened
11 months ago
0
Debug merge PDF
#330
boss-chanon
closed
11 months ago
1
Scrape SET Annual Report [LM-232]
#329
boss-chanon
opened
12 months ago
2
added thai sum preprocess code LM-154
#328
ArthurMinovsky
opened
1 year ago
2
2g 3g pantip preprocess LM-156
#327
ArthurMinovsky
opened
1 year ago
1
PRD_ADDED [LM-155]
#326
ArthurMinovsky
opened
1 year ago
2
add script and doc for wandb sync
#325
kriangkraitan
opened
1 year ago
2
Internet Cleaning [LM-225]
#324
boss-chanon
closed
12 months ago
2
Update README.md for internet data script
#323
Chawak
opened
1 year ago
0
create additional dataset preprocess code [LM-153]
#322
ArthurMinovsky
closed
11 months ago
4
Merge pdf [LM-157]
#321
nuchhub
closed
1 year ago
2
feat(model): update thaigov pipeline to align with the updated website [LM-198]
#320
new5558
closed
1 year ago
2
Mond/refactor internet LM-206
#319
Chawak
closed
1 year ago
2
edit pyproject / config dedup&decont
#318
kriangkraitan
closed
1 year ago
2
edit internet code for new data pipeline
#317
Chawak
closed
1 year ago
3
feat(model): Combined dataset to HF pipeline
#316
boss-chanon
closed
1 year ago
4
Speed up data preprocessing save_to_disk
#315
boat1603
closed
1 year ago
1
Add function to create oscar colassal dataset
#314
kriangkraitan
closed
1 year ago
1
feat(data): Convert the Pile dataset to hf format
#313
boss-chanon
closed
1 year ago
1
feat(data): Convert Pile dataset to hf
#312
boss-chanon
closed
1 year ago
1
feat(model): Add Huggingface trainer
#311
boss-chanon
closed
1 year ago
4
Continue Pretraining Llama7B with Huggingface trainer
#310
boss-chanon
closed
1 year ago
0
docs: Document Overview Data Pipeline
#309
kwankoravich
closed
1 year ago
0
docs: Document Tokenizer training Pipeline
#308
MoosaTae
closed
1 year ago
1
docs: Document Merge Tokenizer Pipeline
#307
MoosaTae
closed
1 year ago
2
docs: Document Create Huggingface Dataset Pipeline
#306
kriangkraitan
closed
1 year ago
0
Add PL Trainer
#305
boat1603
closed
1 year ago
2
docs: document decontamination pipelien and improve doc of dedup
#304
new5558
closed
1 year ago
0
docs(data): add info on deduplication
#303
new5558
closed
1 year ago
0
Mond/internet preprocessing doc
#302
Chawak
closed
1 year ago
1
Model Documentation : Export Model
#301
ArthurMinovsky
opened
1 year ago
0
Model Documentation : Training Model
#300
ArthurMinovsky
opened
1 year ago
0
Model Documentation: Training Tokenizer
#299
ArthurMinovsky
closed
1 year ago
0
Model Documentation : Overview
#298
ArthurMinovsky
opened
1 year ago
0
Storage Proposal
#297
ArthurMinovsky
opened
1 year ago
1
Model Documentation : Merge tokenizer
#296
ArthurMinovsky
closed
1 year ago
0
investigate 4-gpu training of LLaMa
#295
ArthurMinovsky
opened
1 year ago
2
POC multilingual
#294
ArthurMinovsky
closed
1 year ago
1
literature review framework
#293
ArthurMinovsky
opened
1 year ago
0
Recuit Annoucement
#292
ArthurMinovsky
opened
1 year ago
0
Contribution guideline
#291
ArthurMinovsky
opened
1 year ago
0
overview documentation
#290
ArthurMinovsky
opened
1 year ago
1
Next