facebookresearch / metaseq

Repo for external large-scale work
MIT License
6.51k stars 726 forks source link

Fix marmot format variations for cm3v2 #743

Closed berniebear closed 1 year ago

berniebear commented 1 year ago

Patch Description General fix marmot jsonl format variations across text-only datasets (arxiv, github, .. etc).

Testing steps Tested under marmot + openflamingo training on RSC. Log: /checkpoint/vllm/cm3leon/experiments/ablations/DATA_ABLATION_m_flamingo_760m/DATA_ABLATION_m_flamingo_760m.bm_none.fp16.bf16.trunc.fmis0.0.sig0.006.pos0.0002.nobias.noaffln.relu.transformer_lm_megatron.nlay24.emb1536.lrnpos.0emb_scale.tps8192.adam.b2_0.95.cl1.0.lr0.00025.wu1500.dr0.1.atdr0.0.0emb_dr.wd0.1.ms8.mu119209.s1.ngpu256