Patch Description
General fix marmot jsonl format variations across text-only datasets (arxiv, github, .. etc).
Testing steps
Tested under marmot + openflamingo training on RSC. Log: /checkpoint/vllm/cm3leon/experiments/ablations/DATA_ABLATION_m_flamingo_760m/DATA_ABLATION_m_flamingo_760m.bm_none.fp16.bf16.trunc.fmis0.0.sig0.006.pos0.0002.nobias.noaffln.relu.transformer_lm_megatron.nlay24.emb1536.lrnpos.0emb_scale.tps8192.adam.b2_0.95.cl1.0.lr0.00025.wu1500.dr0.1.atdr0.0.0emb_dr.wd0.1.ms8.mu119209.s1.ngpu256
Patch Description General fix marmot jsonl format variations across text-only datasets (arxiv, github, .. etc).
Testing steps Tested under marmot + openflamingo training on RSC. Log: /checkpoint/vllm/cm3leon/experiments/ablations/DATA_ABLATION_m_flamingo_760m/DATA_ABLATION_m_flamingo_760m.bm_none.fp16.bf16.trunc.fmis0.0.sig0.006.pos0.0002.nobias.noaffln.relu.transformer_lm_megatron.nlay24.emb1536.lrnpos.0emb_scale.tps8192.adam.b2_0.95.cl1.0.lr0.00025.wu1500.dr0.1.atdr0.0.0emb_dr.wd0.1.ms8.mu119209.s1.ngpu256