issues
search
CStanKonrad
/
long_llama
LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Apache License 2.0
1.45k
stars
85
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Compared to RAG techniques
#25
leezythu
opened
10 months ago
0
It's questionable whether the context window has truly been expanded?
#24
Vincentyua
opened
10 months ago
0
Help: questions about training on 8k input text length
#23
Force1ess
closed
12 months ago
2
Fix typo
#22
isaacbmiller
closed
1 year ago
1
Where is the learnable temperature parameter in cross_batch_attention?
#21
MarkYangjiayi
closed
1 year ago
1
Need clarification on token limit of input used for fine tuning
#20
lokesh-iterate
opened
1 year ago
2
0-shot long-context summarization / QA inference
#19
shi-kejian
opened
1 year ago
4
How to integrate the method with GQA?
#18
NickGao96
closed
1 year ago
1
utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM
#17
myname36
opened
1 year ago
1
I have some questions
#16
dziulatex
opened
1 year ago
1
How much vram needed to finetune 3b model? Is 12gb enough?
#15
universewill
opened
1 year ago
1
CrossBatch details in appendix A.2
#14
hxs91
closed
1 year ago
1
FoT can only be used for pre-training, can't it be used for instruction fine-tuning?
#13
wujiekd
opened
1 year ago
0
How is the contrastive data pipeline implemented?
#12
MarkYangjiayi
opened
1 year ago
8
Where do i find some function like:
#11
HuXinjing
closed
1 year ago
2
Code for zero-shot arxiv evaluation
#10
bronyayang
opened
1 year ago
1
Support for gradient_checkpointing
#9
Richar-Du
opened
1 year ago
3
About the use of rotary position coding.
#8
tianyabanbu
opened
1 year ago
2
Does each token requires KNN search during inference?
#7
noanti
opened
1 year ago
3
Comparison with other tuning methods
#6
FLLLIGHT
closed
1 year ago
1
How's the speed droping when length get large compare with vanilla llama?
#5
lucasjinreal
opened
1 year ago
11
FoT attention and the scaling trick
#4
StrangeTcy
opened
1 year ago
3
Would LongNet be easily applied to the attention with FoT
#3
jebarpg
opened
1 year ago
1
How would you go about instruction finetuning?
#2
jordancole21
opened
1 year ago
13
Finetuning code?
#1
StrangeTcy
opened
1 year ago
7