Closed LiMa-cas closed 3 months ago
hi,in the paper you said “we use 4096 samples from RedPajama with a context length of 2048”, is it enough for QAT?
Hi, you can refer Figure 4 and Table 5 in paper for the ablation of training samples.
hi,in the paper you said “we use 4096 samples from RedPajama with a context length of 2048”, is it enough for QAT?