issues
search
CStanKonrad
/
long_llama
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Apache License 2.0
1.44k
stars
87
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Compared to RAG techniques
#25
leezythu
opened
6 months ago
0
It's questionable whether the context window has truly been expanded?
#24
Vincentyua
opened
6 months ago
0
Help: questions about training on 8k input text length
#23
Force1ess
closed
7 months ago
2
Fix typo
#22
isaacbmiller
closed
8 months ago
1
Where is the learnable temperature parameter in cross_batch_attention?
#21
MarkYangjiayi
closed
8 months ago
1
Need clarification on token limit of input used for fine tuning
#20
lokesh-iterate
opened
9 months ago
2
0-shot long-context summarization / QA inference
#19
shi-kejian
opened
9 months ago
4
How to integrate the method with GQA?
#18
NickGao96
closed
8 months ago
1
utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM
#17
myname36
opened
9 months ago
1
I have some questions
#16
dziulatex
opened
9 months ago
1
How much vram needed to finetune 3b model? Is 12gb enough?
#15
universewill
opened
9 months ago
1
CrossBatch details in appendix A.2
#14
hxs91
closed
9 months ago
1
FoT can only be used for pre-training, can't it be used for instruction fine-tuning?
#13
wujiekd
opened
10 months ago
0
How is the contrastive data pipeline implemented?
#12
MarkYangjiayi
opened
10 months ago
8
Where do i find some function like:
#11
HuXinjing
closed
10 months ago
2
Code for zero-shot arxiv evaluation
#10
bronyayang
opened
11 months ago
1
Support for gradient_checkpointing
#9
Richar-Du
opened
12 months ago
3
About the use of rotary position coding.
#8
tianyabanbu
opened
12 months ago
2
Does each token requires KNN search during inference?
#7
noanti
opened
12 months ago
3
Comparison with other tuning methods
#6
FLLLIGHT
closed
12 months ago
1
How's the speed droping when length get large compare with vanilla llama?
#5
lucasjinreal
opened
12 months ago
11
FoT attention and the scaling trick
#4
StrangeTcy
opened
12 months ago
3
Would LongNet be easily applied to the attention with FoT
#3
jebarpg
opened
1 year ago
1
How would you go about instruction finetuning?
#2
jordancole21
opened
1 year ago
13
Finetuning code?
#1
StrangeTcy
opened
1 year ago
7