FranxYao Long-Context-Data-Engineering issues

FranxYao / Long-Context-Data-Engineering

Implementation of paper Data Engineering for Scaling Language Models to 128K Context

443 stars 29 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Error in Llama-3.2-3B needle evaluation

#18 prakamya-mishra opened 2 weeks ago
0
ValueError: TensorParallelPreTrainedModel does not support Flash Attention 2.0 yet.

#17 ZetangForward opened 6 months ago
0
复现代码出现海底捞针实验长文本训练显存不够的情况

#16 Kwen-Chen closed 7 months ago
1
Upsampling: Statistical biasas of distribution of dataset

#15 michaelfeil opened 7 months ago
0
In the process of tokenization of data, there is no attack defense injected into the special tokens (such as <s>, etc</s>.) existing in the data

#14 Kwen-Chen opened 7 months ago
2
first step loss of continue pretrained on 80K

#13 ftgreat closed 8 months ago
4
【有个奇怪的问题】如果pred = expect_answer, 按照作者给的metric计算出来的分数不等于1

#12 randomtutu opened 8 months ago
1
Collapsed performance in short length (related to a bug in HF's LlamaDynamicNTKScalingRotaryEmbedding)

#11 gaotianyu1350 opened 8 months ago
0
When did you perform dynamic-NTK?

#10 Liu-yuliang opened 8 months ago
1
Did you use eos token inbetween two documents?

#9 jzhang38 closed 8 months ago
3
论文复现相关

#8 chenglu66 opened 8 months ago
3
我们对比了一下和internlm的区别，发现短文本时模型性能下降明显

#7 esbatmop closed 8 months ago
6
Was the base frequency increased, or do you rely on position interpolation via scaling?

#6 tgunter opened 8 months ago
3
【评估求问】关于pretrain阶段的model，follow instruction能力应该比较差，文中的测试的方法可以分享一下吗？

#5 randomtutu opened 8 months ago
5
Is there GPT_CHAR_TO_TOKEN_RATIO?

#4 ZetangForward closed 9 months ago
2
It seems the result we get is not the same as the repo shows

#3 linbeyoung opened 9 months ago
12
Small correction for YaRN-Mistral model

#2 bloc97 opened 9 months ago
2
Update README.md

#1 eltociear closed 9 months ago
0