CStanKonrad long_llama issues

CStanKonrad / long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Apache License 2.0

1.45k stars 85 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Compared to RAG techniques

#25 leezythu opened 10 months ago
0
It's questionable whether the context window has truly been expanded？

#24 Vincentyua opened 10 months ago
0
Help: questions about training on 8k input text length

#23 Force1ess closed 12 months ago
2
Fix typo

#22 isaacbmiller closed 1 year ago
1
Where is the learnable temperature parameter in cross_batch_attention?

#21 MarkYangjiayi closed 1 year ago
1
Need clarification on token limit of input used for fine tuning

#20 lokesh-iterate opened 1 year ago
2
0-shot long-context summarization / QA inference

#19 shi-kejian opened 1 year ago
4
How to integrate the method with GQA?

#18 NickGao96 closed 1 year ago
1
utilizing Long Llama with Mojo Framework and applying 4-bit quantization and is it possible to use flash attention 2 and your thoughts about Speculative execution for LLM

#17 myname36 opened 1 year ago
1
I have some questions

#16 dziulatex opened 1 year ago
1
How much vram needed to finetune 3b model? Is 12gb enough?

#15 universewill opened 1 year ago
1
CrossBatch details in appendix A.2

#14 hxs91 closed 1 year ago
1
FoT can only be used for pre-training, can't it be used for instruction fine-tuning?

#13 wujiekd opened 1 year ago
0
How is the contrastive data pipeline implemented?

#12 MarkYangjiayi opened 1 year ago
8
Where do i find some function like:

#11 HuXinjing closed 1 year ago
2
Code for zero-shot arxiv evaluation

#10 bronyayang opened 1 year ago
1
Support for gradient_checkpointing

#9 Richar-Du opened 1 year ago
3
About the use of rotary position coding.

#8 tianyabanbu opened 1 year ago
2
Does each token requires KNN search during inference?

#7 noanti opened 1 year ago
3
Comparison with other tuning methods

#6 FLLLIGHT closed 1 year ago
1
How's the speed droping when length get large compare with vanilla llama?

#5 lucasjinreal opened 1 year ago
11
FoT attention and the scaling trick

#4 StrangeTcy opened 1 year ago
3
Would LongNet be easily applied to the attention with FoT

#3 jebarpg opened 1 year ago
1
How would you go about instruction finetuning?

#2 jordancole21 opened 1 year ago
13
Finetuning code?

#1 StrangeTcy opened 1 year ago
7