Closed HuXinjing closed 1 year ago
We plan to release the official FoT large-scale continual pre-training (FoT finetuning) code within two weeks. This code will be in JAX. The instruction fine-tuning code does not use FoT (in fact, it uses a modified version with cross_batch=1
, but this is not the version used to tune the base models, for more see #12).
cool, looking forward to ur entire code of continual pre-training model.
def mem_attn_layer(Ql , Kl , Vl , Cl , Km , Vm , Kp , Vp , attn_scf, mode ):