issues
search
FranxYao
/
Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
443
stars
29
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Error in Llama-3.2-3B needle evaluation
#18
prakamya-mishra
opened
2 weeks ago
0
ValueError: TensorParallelPreTrainedModel does not support Flash Attention 2.0 yet.
#17
ZetangForward
opened
6 months ago
0
复现代码出现海底捞针实验长文本训练显存不够的情况
#16
Kwen-Chen
closed
7 months ago
1
Upsampling: Statistical biasas of distribution of dataset
#15
michaelfeil
opened
7 months ago
0
In the process of tokenization of data, there is no attack defense injected into the special tokens (such as <s>, etc</s>.) existing in the data
#14
Kwen-Chen
opened
7 months ago
2
first step loss of continue pretrained on 80K
#13
ftgreat
closed
8 months ago
4
【有个奇怪的问题】如果pred = expect_answer, 按照作者给的metric计算出来的分数不等于1
#12
randomtutu
opened
8 months ago
1
Collapsed performance in short length (related to a bug in HF's LlamaDynamicNTKScalingRotaryEmbedding)
#11
gaotianyu1350
opened
8 months ago
0
When did you perform dynamic-NTK?
#10
Liu-yuliang
opened
8 months ago
1
Did you use eos token inbetween two documents?
#9
jzhang38
closed
8 months ago
3
论文复现相关
#8
chenglu66
opened
8 months ago
3
我们对比了一下和internlm的区别,发现短文本时模型性能下降明显
#7
esbatmop
closed
8 months ago
6
Was the base frequency increased, or do you rely on position interpolation via scaling?
#6
tgunter
opened
8 months ago
3
【评估求问】关于pretrain阶段的model,follow instruction能力应该比较差,文中的测试的方法可以分享一下吗?
#5
randomtutu
opened
8 months ago
5
Is there GPT_CHAR_TO_TOKEN_RATIO?
#4
ZetangForward
closed
9 months ago
2
It seems the result we get is not the same as the repo shows
#3
linbeyoung
opened
9 months ago
12
Small correction for YaRN-Mistral model
#2
bloc97
opened
9 months ago
2
Update README.md
#1
eltociear
closed
9 months ago
0