CStanKonrad / long_llama

LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.
Apache License 2.0
1.45k stars 85 forks source link

Where do i find some function like: #11

Closed HuXinjing closed 1 year ago

HuXinjing commented 1 year ago

def mem_attn_layer(Ql , Kl , Vl , Cl , Km , Vm , Kp , Vp , attn_scf, mode ):

CStanKonrad commented 1 year ago

We plan to release the official FoT large-scale continual pre-training (FoT finetuning) code within two weeks. This code will be in JAX. The instruction fine-tuning code does not use FoT (in fact, it uses a modified version with cross_batch=1, but this is not the version used to tune the base models, for more see #12).

HuXinjing commented 1 year ago

cool, looking forward to ur entire code of continual pre-training model.